首页 > 解决方案 > 交叉验证后模型性能变差

问题描述

我正在仅具有数字特征的数据集上训练逻辑回归模型。我执行了以下步骤:-

1.)热图以消除变量之间的共线性

2.) 对于我的基线模型,拆分后的交叉验证

3.) 使用 StandarScaler 进行缩放

4.) 拟合和预测

以下是我的代码: -

# SPLITTING 
train_x, test_x, train_y, test_y = train_test_split(data2, y, test_size = 0.2, random_state = 
69)

#MODEL INSTANCE
model = LogisticRegression(random_state = 69)

# SCALING
train_x2 = train_x.copy(deep = True)
test_x2 = test_x.copy(deep = True)

s_scaler = StandardScaler()
s_scaler.fit(train_x2)
s_scaled_train = s_scaler.transform(train_x2)
s_scaled_test = s_scaler.transform(test_x2)

# BASELINE MODEL
cross_val_model2 = -1 * cross_val_score(model, s_scaled_train, train_y, cv = 5,
                              n_jobs = -1, scoring = 'neg_mean_squared_error')
s_score = cross_val_model2.mean()

# FITTING AND PREDICTING
model.fit(s_scaled_train, train_y)
pred = model.predict(s_scaled_test)
mse = mean_squared_error(test_y, pred)

CV 分数为0.06,拟合预测后的分数为0.23。我觉得这很奇怪,因为 CV 是衡量模型执行情况的指标。所以我至少应该得到一个等于 CV 分数的分数,对吧?

标签: cross-validationmse

解决方案


推荐阅读