首页 > 解决方案 > 使用 cross_val_score() 时输入包含 NaN 错误

问题描述

我正在使用不同的两个测试并训练 CSV 文件来预测值。我知道如何使用一个数据集文件拆分和训练模型。我在很多网站上搜索。他们说测试文件是为了从训练的模型中获取输出值,但我不知道在从 train.csv 训练我的模型后在哪里使用 test.csv 文件。

我在下面给出了我的代码

train= pd.read_csv("train.csv")
test=pd.read_csv("test.csv")

train.drop_duplicates(inplace=True)


array= train.values
X=array[:,:-1] #Drop the column which we need for prediction
Y=array[:,-1] #load the droped column in another variable


X_train,X_test,Y_train,Y_test = train_test_split(X,Y,test_size=0.2)


models = []
models.append(('LR', LogisticRegression(solver='liblinear', multi_class='ovr')))
models.append(('LDA', LinearDiscriminantAnalysis()))
models.append(('KNN', KNeighborsClassifier()))
models.append(('CART', DecisionTreeClassifier()))
models.append(('NB', GaussianNB()))
models.append(('SVM', SVC(gamma='auto')))

#The first two variables indicate the input set of training and testing
#the more data we given and the training tne data will give more accurate result.


results = []
names = []
for name, model in models:
    kfold = StratifiedKFold(n_splits=10, random_state=1, shuffle=True)
    cv_results = cross_val_score(model, X_train, Y_train, cv=kfold, scoring='accuracy')
    results.append(cv_results)
    names.append(name)
    print('%s: %f (%f)' % (name, cv_results.mean(), cv_results.std()))

这是我可以在终端上看到的以下错误

    Traceback (most recent call last):
  File "E:/python/hacathon2/dataset/ml.py", line 55, in <module>
    cv_results = cross_val_score(model, X_train, Y_train, cv=kfold, scoring='accuracy')
  File "E:\python\venv\lib\site-packages\sklearn\utils\validation.py", line 72, in inner_f
    return f(**kwargs)
  File "E:\python\venv\lib\site-packages\sklearn\model_selection\_validation.py", line 401, in cross_val_score
    cv_results = cross_validate(estimator=estimator, X=X, y=y, groups=groups,
  File "E:\python\venv\lib\site-packages\sklearn\utils\validation.py", line 72, in inner_f
    return f(**kwargs)
  File "E:\python\venv\lib\site-packages\sklearn\model_selection\_validation.py", line 248, in cross_validate
    for train, test in cv.split(X, y, groups))
  File "E:\python\venv\lib\site-packages\sklearn\model_selection\_split.py", line 735, in split
    y = check_array(y, ensure_2d=False, dtype=None)
  File "E:\python\venv\lib\site-packages\sklearn\utils\validation.py", line 72, in inner_f
    return f(**kwargs)
  File "E:\python\venv\lib\site-packages\sklearn\utils\validation.py", line 644, in check_array
    _assert_all_finite(array,
  File "E:\python\venv\lib\site-packages\sklearn\utils\validation.py", line 104, in _assert_all_finite
    raise ValueError("Input contains NaN")
ValueError: Input contains NaN

所以错误显示“输入包含NaN”那么如何解决上述错误?

标签: python-3.xcsvmachine-learningscikit-learn

解决方案


要修复该错误,我建议您:

df.fillna(0)

推荐阅读