首页 > 解决方案 > 如何在训练/测试拆分后使用交叉验证验证

问题描述

在将数据拆分为训练和测试后,我在训练集上使用了 K-cross 验证。但这给出了一个错误,我认为这是由于训练和测试拆分后的索引。下面是我使用的代码。如何在火车/火车拆分后重置索引或处理此错误的任何其他建议将不胜感激。我已经尝试过 df.reset_index() 但这给出了一个错误 AttributeError: 'numpy.ndarray' object has no attribute 'reset_index'。谢谢你。

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.1, random_state=99)

# k-fold cross validation
scores = list()
kfold = KFold(n_splits=10, shuffle=True)
# enumerate splits
for train_ix, test_ix in kfold.split(X_train):

    train_X, test_X = X_train[train_ix], X_train[test_ix]
    train_y, test_y = y_train[train_ix], y_train[test_ix]
    # fit model
    model = LinearRegression()
    model.fit(train_X, train_y)
    # evaluate model
    yhat = model.predict(test_X)
    score = np.sqrt(metrics.mean_absolute_error(yhat, test_y))
    print('Fold score : {}'.format(score))

KeyError: "Passing list-likes to .loc or [] with any missing labels is no longer supported. The following labels were missing: Int64Index([    3,     9,    10,    17,    19,\n            ...\n            41050, 41056, 41060, 41101, 41120],\n           dtype='int64', length=3708).

标签: indexingcross-validationtrain-test-split

解决方案


推荐阅读