首页 > 解决方案 > Xgboost 将所有内容预测为 Null

问题描述

我正在尝试训练 XGBoost 分类模型,并且我已经完成了几次。这次我尝试做一个超参数网格搜索并使用 xgboost.cv 做一个 CV。每次我运行我的代码时,它都会给出一个关键错误:

我还尝试仅使用带有一些默认参数的 xgboost.train,当我使用这些默认参数来预测相同的 DMatrix 时,它会将所有内容预测为空。

这是我的 DMatrix,其中我在 4 个功能中缺少值,为此我在 DMatrix 中指定了 missing = np.nan

xgbmat_train = xgb.DMatrix(X_train.values,label= 
Y_train.values,missing=np.nan,weight = train_weights)
xgbmat_test = xgb.DMatrix(X_test.values,label=Y_test.values,missing=np.nan,weight=test_weights)

这些是我的初始参数

initial_params = {'learning_rate':0.1,'n_estimators':1000,'objective':'binary:logistic','booster':'gbtree','reg_alpha':0,
            'reg_lambda':1,'max_depth':5,'min_child_weight':1,'gamma':0,'subsample':0.8,'colsample_bytree':0.8,
            'scale_pos_weight':1,'missing':np.nan,'seed':27,'eval_metric':'auc','n_jobs':32,'silent':True}

这些是我的网格搜索参数

gridsearch_params = [(max_depth,min_child_weight)
                for max_depth in range(4,10)
                for min_child_weight in range(1,6)]

下面是我正在进行网格搜索的循环

max_auc = 0.0
best_params = ''
print(gc.collect())
for max_depth, min_child_weight in gridsearch_params:
    print(gc.collect())
    print("CV with max_depth = {}, min_child_weight= 
    {}".format(max_depth,min_child_weight))
    initial_params['max_depth'] = max_depth
    initial_params['min_child_weight'] = min_child_weight
    cv_results = xgb.cv(initial_params,
                    xgbmat_train,
                    num_boost_round = 200,
                    seed = 42,
                    stratified = True,
                    shuffle=True,
                    nfold=3,
                    metrics={'auc'},
                    early_stopping_rounds = 50)
    mean_auc = cv_results['test-auc-mean'].max()
    boost_rounds = cv_results['test-auc-mean'].argmax()
    cv_results = cv_results.append(cv_results)
    if mean_auc > max_auc:
        max_auc = mean_auc
        best_params = (max_depth,min_child_weight)
    print(gc.collect())
    print(cv_results)
    print(mean_auc)
    print(boost_rounds)

print("Best param: {}, {}, aucpr: {}".format(best_params[0],best_params[1],max_auc))

这是我在运行上述代码时遇到的错误

KeyError                                  Traceback (most recent call 
last)
<ipython-input-15-f546ef27594f> in <module>
     15                         nfold=3,
     16                         metrics={'auc'},
---> 17                         early_stopping_rounds = 50)
     18     mean_auc = cv_results['test-auc-mean'].max()
     19     boost_rounds = cv_results['test-auc-mean'].argmax()

~/anaconda3/lib/python3.7/site-packages/xgboost/training.py in cv(params, dtrain, num_boost_round, nfold, stratified, folds, metrics, obj, feval, maximize, early_stopping_rounds, fpreproc, as_pandas, verbose_eval, show_stdv, seed, callbacks, shuffle)
    461                                end_iteration=num_boost_round,
    462                                rank=0,
--> 463                                evaluation_result_list=res))
    464         except EarlyStopException as e:
    465             for k in results:

~/anaconda3/lib/python3.7/site-packages/xgboost/callback.py in callback(env)
    243                                    best_msg=state['best_msg'])
    244         elif env.iteration - best_iteration >= stopping_rounds:
--> 245             best_msg = state['best_msg']
    246             if verbose and env.rank == 0:
    247                 msg = "Stopping. Best iteration:\n{}\n\n"

KeyError: 'best_msg'

我尝试用 -9999.0 填充 NA,并在 DMatrix 中的缺失参数中指定相同的值,但会引发相同的错误。我在某个艰难的最后期限内奔跑,任何帮助都将深表感谢

标签: pythonpython-3.xmachine-learningxgboost

解决方案


推荐阅读