首页 > 解决方案 > 如何使用 CSV 文件中包含的数据开发随机森林 Python 模型?

问题描述

我一直在尝试训练一个随机森林模型来使用 Python 预测 CSV 文件中包含的数据。此处显示 CSV 文件的第一行。我有兴趣训练模型使用其他变量(日期时间除外)来预测列 J 的值。当我尝试运行模型时,最初指出的错误是:

ValueError: could not convert string to float: '01/01/2018 02:00'

我将“日期时间”列转换为日期时间格式,看看是否有帮助,但现在出现错误:

 Traceback (most recent call last):
  File "C:\Users\elsam\Documents\Year 3\Final EN3300 Project\Machine Learning\Code\Random Forest Code.py", line 36, in <module>
    regr_multirf.fit(X_train, y_train)
  File "C:\Users\elsam\Documents\Year 3\Final EN3300 Project\Machine Learning\Code\venv\lib\site-packages\sklearn\multioutput.py", line 160, in fit
    X, y = self._validate_data(X, y,
  File "C:\Users\elsam\Documents\Year 3\Final EN3300 Project\Machine Learning\Code\venv\lib\site-packages\sklearn\base.py", line 433, in _validate_data
    X, y = check_X_y(X, y, **check_params)
  File "C:\Users\elsam\Documents\Year 3\Final EN3300 Project\Machine Learning\Code\venv\lib\site-packages\sklearn\utils\validation.py", line 63, in inner_f
    return f(*args, **kwargs)
  File "C:\Users\elsam\Documents\Year 3\Final EN3300 Project\Machine Learning\Code\venv\lib\site-packages\sklearn\utils\validation.py", line 814, in check_X_y
    X = check_array(X, accept_sparse=accept_sparse,
  File "C:\Users\elsam\Documents\Year 3\Final EN3300 Project\Machine Learning\Code\venv\lib\site-packages\sklearn\utils\validation.py", line 63, in inner_f
    return f(*args, **kwargs)
  File "C:\Users\elsam\Documents\Year 3\Final EN3300 Project\Machine Learning\Code\venv\lib\site-packages\sklearn\utils\validation.py", line 616, in check_array
    array = np.asarray(array, order=order, dtype=dtype)
  File "C:\Users\elsam\Documents\Year 3\Final EN3300 Project\Machine Learning\Code\venv\lib\site-packages\numpy\core\_asarray.py", line 102, in asarray
    return array(a, dtype, copy=False, order=order)
TypeError: float() argument must be a string or a number, not 'Timestamp'

我不确定要添加适当的代码以将日期时间附加到每个数据点,因为当我训练和测试模型时,我需要它来比较实际值和预测值。这是我当前的代码,直到发生错误的行:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import train_test_split
from sklearn.multioutput import MultiOutputRegressor
from sklearn.metrics import mean_absolute_error,mean_squared_error, r2_score
import pickle
pd.set_option('display.max_columns', None)

# import csv data
df = pd.read_csv('C:/Users/elsam/Documents/Year 3/Final EN3300 Project/Machine Learning/Data/locations/Combined/ASP.csv', index_col=0)
df.fillna(df.mean(), inplace=True)
save_model_path = 'C:/Users/elsam/Documents/Year 3/Final EN3300 Project/Machine Learning/Model'
df['datetime'] = pd.to_datetime(df['datetime'])

# split train and test data
num_col = len(df.columns)
split_col = num_col - 1
X = df.iloc[:, 0:split_col].values
y = df.iloc[:, split_col:].values
X_train, X_test, Y_train, Y_test = train_test_split(X, y, test_size=0.2,random_state=30)
split_row = int(len(df) * 0.8 // 48 * 48)
X_train,X_test,y_train,y_test = X[:split_row],X[split_row:],y[:split_row],y[split_row:]

# create multi random forest model
max_depth = 30
regr_multirf = MultiOutputRegressor(RandomForestRegressor(n_estimators=100, max_depth=max_depth, random_state=0))
regr_multirf.fit(X_train, y_train)

标签: pythoncsvmachine-learningscikit-learnrandom-forest

解决方案


推荐阅读