python - 如何使用 CSV 文件中包含的数据开发随机森林 Python 模型?
问题描述
我一直在尝试训练一个随机森林模型来使用 Python 预测 CSV 文件中包含的数据。此处显示 CSV 文件的第一行。我有兴趣训练模型使用其他变量(日期时间除外)来预测列 J 的值。当我尝试运行模型时,最初指出的错误是:
ValueError: could not convert string to float: '01/01/2018 02:00'
我将“日期时间”列转换为日期时间格式,看看是否有帮助,但现在出现错误:
Traceback (most recent call last):
File "C:\Users\elsam\Documents\Year 3\Final EN3300 Project\Machine Learning\Code\Random Forest Code.py", line 36, in <module>
regr_multirf.fit(X_train, y_train)
File "C:\Users\elsam\Documents\Year 3\Final EN3300 Project\Machine Learning\Code\venv\lib\site-packages\sklearn\multioutput.py", line 160, in fit
X, y = self._validate_data(X, y,
File "C:\Users\elsam\Documents\Year 3\Final EN3300 Project\Machine Learning\Code\venv\lib\site-packages\sklearn\base.py", line 433, in _validate_data
X, y = check_X_y(X, y, **check_params)
File "C:\Users\elsam\Documents\Year 3\Final EN3300 Project\Machine Learning\Code\venv\lib\site-packages\sklearn\utils\validation.py", line 63, in inner_f
return f(*args, **kwargs)
File "C:\Users\elsam\Documents\Year 3\Final EN3300 Project\Machine Learning\Code\venv\lib\site-packages\sklearn\utils\validation.py", line 814, in check_X_y
X = check_array(X, accept_sparse=accept_sparse,
File "C:\Users\elsam\Documents\Year 3\Final EN3300 Project\Machine Learning\Code\venv\lib\site-packages\sklearn\utils\validation.py", line 63, in inner_f
return f(*args, **kwargs)
File "C:\Users\elsam\Documents\Year 3\Final EN3300 Project\Machine Learning\Code\venv\lib\site-packages\sklearn\utils\validation.py", line 616, in check_array
array = np.asarray(array, order=order, dtype=dtype)
File "C:\Users\elsam\Documents\Year 3\Final EN3300 Project\Machine Learning\Code\venv\lib\site-packages\numpy\core\_asarray.py", line 102, in asarray
return array(a, dtype, copy=False, order=order)
TypeError: float() argument must be a string or a number, not 'Timestamp'
我不确定要添加适当的代码以将日期时间附加到每个数据点,因为当我训练和测试模型时,我需要它来比较实际值和预测值。这是我当前的代码,直到发生错误的行:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import train_test_split
from sklearn.multioutput import MultiOutputRegressor
from sklearn.metrics import mean_absolute_error,mean_squared_error, r2_score
import pickle
pd.set_option('display.max_columns', None)
# import csv data
df = pd.read_csv('C:/Users/elsam/Documents/Year 3/Final EN3300 Project/Machine Learning/Data/locations/Combined/ASP.csv', index_col=0)
df.fillna(df.mean(), inplace=True)
save_model_path = 'C:/Users/elsam/Documents/Year 3/Final EN3300 Project/Machine Learning/Model'
df['datetime'] = pd.to_datetime(df['datetime'])
# split train and test data
num_col = len(df.columns)
split_col = num_col - 1
X = df.iloc[:, 0:split_col].values
y = df.iloc[:, split_col:].values
X_train, X_test, Y_train, Y_test = train_test_split(X, y, test_size=0.2,random_state=30)
split_row = int(len(df) * 0.8 // 48 * 48)
X_train,X_test,y_train,y_test = X[:split_row],X[split_row:],y[:split_row],y[split_row:]
# create multi random forest model
max_depth = 30
regr_multirf = MultiOutputRegressor(RandomForestRegressor(n_estimators=100, max_depth=max_depth, random_state=0))
regr_multirf.fit(X_train, y_train)
解决方案
推荐阅读
- angular - 是否可以在 p 日历中选择秒?
- batch-file - 用于提供有条件的不同响应的批处理文件命令
- mysql - SQLSTATE [HY000]:一般错误:1 无法创建/写入文件'/var/www/html/new_7alaqa/public/dumpfiles/15869372079790_1036teachernote.txt'
- python - 绘制机器学习的校准曲线
- javascript - 带有通知器的 RxJS `repeatWhen` 快速重复
- python - 如何在 Python 中通过 xPath 使用 XML 的前一个兄弟姐妹?
- python - Python过滤带有条件的重复行
- c++ - C++17 制作函数返回类型模板,然后为支持的类型编写实现
- docker - JsReport 将环境变量加载到资源中
- python - 从 txt 打开保存数据,其中 dict 和更新并再次保存