python - XGBoost - early_stopping_rounds 结合清除交叉验证的正确使用
问题描述
我正在研究时间序列数据的 XGBoost 分类器,我想将 early_stopping_rounds 选项与清除交叉验证相结合(请参阅https://stats.stackexchange.com/questions/443159/what-is-combinatorial-时间序列数据清除交叉验证)。我有以下代码:
# Purged Cross Validation
index_train1 = np.arange(0,len(X_train))
def split(a, n):
k, m = divmod(len(a), n)
return (a[i * k + min(i, m):(i + 1) * k + min(i + 1, m)] for i in range(n))
k = 6 # this means that the cross-validation will be run over (6*5)/2 = 15 folds
splits = list(split(index_train1, k))
purged_train = []
for m in range(0,len(splits)):
for n in range(len(splits)-m-2, -1, -1):
purged_train.append(splits[m])
purged_validation = []
l = 1
while l <= len(splits):
for j in range(l,len(splits)):
purged_validation.append(splits[j])
l+=1
purged_df = pd.DataFrame({'train_index':purged_train, 'val_index': purged_validation})
# XGBoost Classifier
params={
'n_estimators':3000,
'objective': 'binary:logistic',
'learning_rate': 0.005,
'subsample':0.555,
'colsample_bytree':0.7,
'min_child_weight':3,
'max_depth':8,
'n_jobs' : -1
}
booster = xgboost.XGBClassifier(parameters=params)
cv_score = []
for kfold in range(0,len(purged_df)):
print('Fold no. ' + str(kfold+1))
X_train1 = X_train[purged_df['train_index'][kfold].tolist()]
X_val1 = X_train[purged_df['val_index'][kfold]]
y_train1 = y_train.iloc[purged_df['train_index'][kfold]]
y_val1 = y_train.iloc[purged_df['val_index'][kfold]]
booster = booster.fit(X_train1,y_train1, eval_set=[(X_train1, y_train1), (X_val1, y_val1)], early_stopping_rounds=100, eval_metric=["auc"], verbose = 1000)
cv_score.append(booster.score(X_val1, y_val1))
我注意到通过改变 early_stopping_rounds 的值,测试精度没有变化,计算如下:
from sklearn.metrics import accuracy_score
y_pred = booster.predict(X_test, ntree_limit=booster.best_ntree_limit)
y_pred = pd.Series(y_pred, index = y_test.index, name='pred')
print('Accuracy Test: %s %%' % round(accuracy_score(y_test, y_pred)*100,2))
我怀疑这里出了点问题,代码可能只考虑了循环的最后一次迭代,而忽略了与 early_stopping 轮次进行交叉验证的整个过程,有人知道如何解决这个问题吗?谢谢
解决方案
推荐阅读
- swift - 可选 UI 元素的快速性能
- java - 如何在文件txt中查找日期YYYY-MM-DD
- nsis - 卸载程序的自定义文本
- javascript - 用于具有特定字符顺序的强密码的 Javascript 正则表达式
- mysql - Spring Boot & MySql - 不能使用新的服务数据库用户名/密码
- android - 带有谷歌包的奇怪 NullPointerException WebSocketHandshake.verifyServerHandshakeHeaders
- office365 - 尝试使用未经许可的用户进行模拟时出现 Office 365 问题
- iframe - 带有 webplayer 的 iframe:未捕获的 ReferenceError:未定义 BrowserDetect
- vivado-hls - Vivado HLS RTL/协同仿真失败
- ios - 配置文件中的重复键导致构建失败:错误 MSB4018