首页 > 解决方案 > XGBoost - early_stopping_rounds 结合清除交叉验证的正确使用

问题描述

我正在研究时间序列数据的 XGBoost 分类器,我想将 early_stopping_rounds 选项与清除交叉验证相结合(请参阅https://stats.stackexchange.com/questions/443159/what-is-combinatorial-时间序列数据清除交叉验证)。我有以下代码:

# Purged Cross Validation
index_train1 = np.arange(0,len(X_train))

def split(a, n):
    k, m = divmod(len(a), n)
    return (a[i * k + min(i, m):(i + 1) * k + min(i + 1, m)] for i in range(n))

k = 6  # this means that the cross-validation will be run over (6*5)/2 = 15 folds

splits = list(split(index_train1, k))

purged_train = []

for m in range(0,len(splits)):
    for n in range(len(splits)-m-2, -1, -1):
        purged_train.append(splits[m])

purged_validation = []

l = 1
while l <= len(splits):
    for j in range(l,len(splits)):
        purged_validation.append(splits[j])
    l+=1
    
purged_df = pd.DataFrame({'train_index':purged_train, 'val_index': purged_validation})

# XGBoost Classifier
params={
    'n_estimators':3000,
    'objective': 'binary:logistic',
    'learning_rate': 0.005,
    'subsample':0.555,
    'colsample_bytree':0.7,
    'min_child_weight':3,
    'max_depth':8,
    'n_jobs' : -1
}

booster = xgboost.XGBClassifier(parameters=params)
cv_score = []


for kfold in range(0,len(purged_df)):
    print('Fold no. ' + str(kfold+1))
    X_train1 = X_train[purged_df['train_index'][kfold].tolist()]
    X_val1 = X_train[purged_df['val_index'][kfold]]
    y_train1 = y_train.iloc[purged_df['train_index'][kfold]]
    y_val1 = y_train.iloc[purged_df['val_index'][kfold]]
    booster = booster.fit(X_train1,y_train1, eval_set=[(X_train1, y_train1), (X_val1, y_val1)], early_stopping_rounds=100, eval_metric=["auc"], verbose = 1000)
    cv_score.append(booster.score(X_val1, y_val1))

我注意到通过改变 early_stopping_rounds 的值,测试精度没有变化,计算如下:

from sklearn.metrics import accuracy_score
y_pred = booster.predict(X_test, ntree_limit=booster.best_ntree_limit)
y_pred = pd.Series(y_pred, index = y_test.index, name='pred')
print('Accuracy Test: %s %%' % round(accuracy_score(y_test, y_pred)*100,2))

我怀疑这里出了点问题,代码可能只考虑了循环的最后一次迭代,而忽略了与 early_stopping 轮次进行交叉验证的整个过程,有人知道如何解决这个问题吗?谢谢

标签: pythonxgboost

解决方案


推荐阅读