python - FitFailedWarning in Simple Linear Regression scoring with cross_val_score
问题描述
I'm using a very simple csv file that I downloaded from the Internet, with only two columns. The first column is "MonthsExperience" and it goes like "3, 3, 4, 4, 5, 6..." and the second column is like "424, 387, 555, 59, 533...".
I'm trying to get the cross_val_score of the RandomForestRegressor model on simple linear regression for the sake of training.
Here's the code:
import numpy as np
import pandas as pd
data = pd.read_csv("Blogging_Income.csv")
X = data["MonthsExperience"]
y = data["Income"]
from sklearn.ensemble import RandomForestRegressor
rfr = RandomForestRegressor()
from sklearn.model_selection import cross_val_score
cv_r2 = cross_val_score(rfr, X, y, cv = 5, scoring = None)
print(cv_r2)
I get a long white warning from sklearn, pointing that all the results are turned to NaN because the model couldn't fit. The upper part of the warning/error I get is like this:
[nan nan nan nan nan]
C:\Users\----\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:615: FitFailedWarning: Estimator fit failed. The score on this train-test partition for these parameters will be set to nan. Details:
Traceback (most recent call last):
File "C:\Users\----\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py", line 598, in _fit_and_score
estimator.fit(X_train, y_train, **fit_params)
File "C:\Users\----\anaconda3\lib\site-packages\sklearn\ensemble\_forest.py", line 304, in fit
X, y = self._validate_data(X, y, multi_output=True,
File "C:\Users\----\anaconda3\lib\site-packages\sklearn\base.py", line 433, in _validate_data
X, y = check_X_y(X, y, **check_params)
File "C:\Users\----\anaconda3\lib\site-packages\sklearn\utils\validation.py", line 63, in inner_f
return f(*args, **kwargs)
File "C:\Users\----\anaconda3\lib\site-packages\sklearn\utils\validation.py", line 871, in check_X_y
X = check_array(X, accept_sparse=accept_sparse,
File "C:\Users\----\anaconda3\lib\site-packages\sklearn\utils\validation.py", line 63, in inner_f
return f(*args, **kwargs)
File "C:\Users\----\anaconda3\lib\site-packages\sklearn\utils\validation.py", line 694, in check_array
raise ValueError(
ValueError: Expected 2D array, got 1D array instead:
array=[ 6. 6. 7. 8. 8. 9. 9. 10. 11. 11. 12. 12. 12. 13. 13. 14. 14. 15.
15. 16. 16. 17. 18. 18.].
Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.
It appears like the array is in wrong shape but I don't understand why. I also don't understand how I could use array.reshape to make this work.
解决方案
RandomForest, similarly to any other machine learning model, requires your data to be 2D. Even if you have just one feature, your X has to be N x 1, instead of a vector of length N.
You can reshape your data using numpy
X = np.array(X).reshape(-1, 1)
推荐阅读
- git - 什么可以将 `[url ...] insteadOf=...` 添加到我的 .gitconfig 文件中?
- scala - 如何使用带有花车的凿子 dsptools
- c# - 高效处理数百个 AI
- clojure - Anglican Clojure 中的嵌套 Let
- c# - 从 C# 程序访问 C++ DLL
- html - 如何在列布局css上扩展背景颜色
- php - Codeigniter搜索引擎BUG:结果分页项显示所有帖子
- error-handling - Julia:如何在 Atom 中从 Juno 复制错误消息?
- python - python列表变量字典
- google-sheets - 查找行名并根据标记的单元格返回列标题的 Google 工作表公式