首页 > 解决方案 > The evaluation metric in the validation set in xgboost Python differs from the one I get when making a prediction

问题描述

I am using an evaluation set to implement early stopping with xgboost in Python. What puzzles me is that the evaluation metric reported during the training as optimal is much better than the one I get when I make predictions with the same model on the set I used for evaluation purposes.

To make this clear I use a reproducible example with a toy dataset. In this case the difference is not very large, although still significant. However in the case of the datasets I work with in reality the gap is much more substantial.

The code is the following:

import xgboost as xgb

import seaborn as sns

def xgb_mape(preds, dtrain):
   labels = dtrain.get_label()
   return('mape', np.mean(np.abs((labels - preds) / (labels+1))))

mpg = sns.load_dataset('mpg')

mpg = mpg.sample(frac = 1)

n = int(mpg.shape[0] * 0.7)

mpg_train = mpg.iloc[:n, :7]

mpg_test = mpg.iloc[n:, :7]

mpg_train_y = mpg_train.iloc[:, 0].values

mpg_test_y = mpg_test.iloc[:, 0].values

mpg_train_X = mpg_train.iloc[:, 1:].values

mpg_test_X = mpg_test.iloc[:, 1:].values

xgb_model_mpg = xgb.XGBRegressor(max_depth= 10, learning_rate=0.1, n_estimators=1000, silent=True, \
                                 objective='reg:linear',\
                 booster='gbtree', subsample= 0.6, colsample_bytree= 0.9, colsample_bylevel= 1, reg_lambda= 20 ,\
                 random_state=1 , seed= 1, importance_type='gain')

xgb_model_mpg.fit(mpg_train_X ,mpg_train_y , eval_set= [(mpg_test_X , mpg_test_y )], eval_metric= xgb_mape,\
              early_stopping_rounds= 20)
[...]
82] validation_0-rmse:3.41167   validation_0-mape:0.085761
[83]    validation_0-rmse:3.40828   validation_0-mape:0.085618
[84]    validation_0-rmse:3.40087   validation_0-mape:0.085519
[85]    validation_0-rmse:3.403 validation_0-mape:0.085631
[86]    validation_0-rmse:3.39977   validation_0-mape:0.085711
[87]    validation_0-rmse:3.39626   validation_0-mape:0.085739
[88]    validation_0-rmse:3.40048   validation_0-mape:0.085727
[89]    validation_0-rmse:3.40356   validation_0-mape:0.085883
[90]    validation_0-rmse:3.40341   validation_0-mape:0.085664
Stopping. Best iteration:
[70]    validation_0-rmse:3.42186   validation_0-mape:0.085076

XGBRegressor(base_score=0.5, booster='gbtree', colsample_bylevel=1,
       colsample_bytree=0.9, gamma=0, importance_type='gain',
       learning_rate=0.1, max_delta_step=0, max_depth=10,
       min_child_weight=1, missing=None, n_estimators=1000, n_jobs=1,
       nthread=None, objective='reg:linear', random_state=1, reg_alpha=0,
       reg_lambda=20, scale_pos_weight=1, seed=1, silent=True,
       subsample=0.6)

y_pred = xgb_model_mpg.predict(mpg_test_X)

results = pd.DataFrame({'actual':mpg_test_y, 'predictions' : y_pred})

results['Absolute_Percent_Error'] = 100 * np.abs(results['actual'] - results['predictions'])/results['actual']

MAPE = results['Absolute_Percent_Error'].mean()

MAPE
8.982732737486339

So in this case during the training I get a MAPE of 8.5% and when applying the model to the same test set I get a MAPE close to 9%.

As I said in other examples with larger and more complex datasets the differences can be much larger, e.g. 41% vs. 58%.

标签: pythonpython-3.xvalidationxgboost

解决方案


There are two different issues here. One small: you have slightly different definition of the evaluation function in xgb training and outside (there is +1 in the denominator in the xgb evaluation). One more significant issue: xgboost (in contrast to lightgbm) by default calculates predictions using all trained trees instead of the best number of trees. To get the optimal number of trees in the prediction use y_pred = xgb_model_mpg.predict(mpg_test_X, ntree_limit=xgb_model_mpg.best_ntree_limit)


推荐阅读