python-3.x - 使用 cross_val_score() 时输入包含 NaN 错误
问题描述
我正在使用不同的两个测试并训练 CSV 文件来预测值。我知道如何使用一个数据集文件拆分和训练模型。我在很多网站上搜索。他们说测试文件是为了从训练的模型中获取输出值,但我不知道在从 train.csv 训练我的模型后在哪里使用 test.csv 文件。
我在下面给出了我的代码
train= pd.read_csv("train.csv")
test=pd.read_csv("test.csv")
train.drop_duplicates(inplace=True)
array= train.values
X=array[:,:-1] #Drop the column which we need for prediction
Y=array[:,-1] #load the droped column in another variable
X_train,X_test,Y_train,Y_test = train_test_split(X,Y,test_size=0.2)
models = []
models.append(('LR', LogisticRegression(solver='liblinear', multi_class='ovr')))
models.append(('LDA', LinearDiscriminantAnalysis()))
models.append(('KNN', KNeighborsClassifier()))
models.append(('CART', DecisionTreeClassifier()))
models.append(('NB', GaussianNB()))
models.append(('SVM', SVC(gamma='auto')))
#The first two variables indicate the input set of training and testing
#the more data we given and the training tne data will give more accurate result.
results = []
names = []
for name, model in models:
kfold = StratifiedKFold(n_splits=10, random_state=1, shuffle=True)
cv_results = cross_val_score(model, X_train, Y_train, cv=kfold, scoring='accuracy')
results.append(cv_results)
names.append(name)
print('%s: %f (%f)' % (name, cv_results.mean(), cv_results.std()))
这是我可以在终端上看到的以下错误
Traceback (most recent call last):
File "E:/python/hacathon2/dataset/ml.py", line 55, in <module>
cv_results = cross_val_score(model, X_train, Y_train, cv=kfold, scoring='accuracy')
File "E:\python\venv\lib\site-packages\sklearn\utils\validation.py", line 72, in inner_f
return f(**kwargs)
File "E:\python\venv\lib\site-packages\sklearn\model_selection\_validation.py", line 401, in cross_val_score
cv_results = cross_validate(estimator=estimator, X=X, y=y, groups=groups,
File "E:\python\venv\lib\site-packages\sklearn\utils\validation.py", line 72, in inner_f
return f(**kwargs)
File "E:\python\venv\lib\site-packages\sklearn\model_selection\_validation.py", line 248, in cross_validate
for train, test in cv.split(X, y, groups))
File "E:\python\venv\lib\site-packages\sklearn\model_selection\_split.py", line 735, in split
y = check_array(y, ensure_2d=False, dtype=None)
File "E:\python\venv\lib\site-packages\sklearn\utils\validation.py", line 72, in inner_f
return f(**kwargs)
File "E:\python\venv\lib\site-packages\sklearn\utils\validation.py", line 644, in check_array
_assert_all_finite(array,
File "E:\python\venv\lib\site-packages\sklearn\utils\validation.py", line 104, in _assert_all_finite
raise ValueError("Input contains NaN")
ValueError: Input contains NaN
所以错误显示“输入包含NaN”那么如何解决上述错误?
解决方案
要修复该错误,我建议您:
df.fillna(0)
推荐阅读
- rxjs - 如何在 flatMap 之后获取结果并抛出一个带有结果的新观察者?
- tableau-api - 在 Tableau 中使用 NULL 计算列差异
- ruby-on-rails - rails循环中唯一记录的总和
- javascript - Javascript自动幻灯片问题
- python - 计算熊猫数据帧中频率不一致的numpy IRR
- angular - 子组件的表单验证不会出现错误【Angular 材质】
- haskell - Haskell 命令行脚本应该如何报告来自`read` 的错误?
- javascript - 修改 onClick() Javascript 上的输出
- python - 对背景较浅的图像进行伽马校正
- excel - excel vba组合图根据动态选择确定哪些应该是区域,哪些是线标记