python - The evaluation metric in the validation set in xgboost Python differs from the one I get when making a prediction
问题描述
I am using an evaluation set to implement early stopping with xgboost in Python. What puzzles me is that the evaluation metric reported during the training as optimal is much better than the one I get when I make predictions with the same model on the set I used for evaluation purposes.
To make this clear I use a reproducible example with a toy dataset. In this case the difference is not very large, although still significant. However in the case of the datasets I work with in reality the gap is much more substantial.
The code is the following:
import xgboost as xgb
import seaborn as sns
def xgb_mape(preds, dtrain):
labels = dtrain.get_label()
return('mape', np.mean(np.abs((labels - preds) / (labels+1))))
mpg = sns.load_dataset('mpg')
mpg = mpg.sample(frac = 1)
n = int(mpg.shape[0] * 0.7)
mpg_train = mpg.iloc[:n, :7]
mpg_test = mpg.iloc[n:, :7]
mpg_train_y = mpg_train.iloc[:, 0].values
mpg_test_y = mpg_test.iloc[:, 0].values
mpg_train_X = mpg_train.iloc[:, 1:].values
mpg_test_X = mpg_test.iloc[:, 1:].values
xgb_model_mpg = xgb.XGBRegressor(max_depth= 10, learning_rate=0.1, n_estimators=1000, silent=True, \
objective='reg:linear',\
booster='gbtree', subsample= 0.6, colsample_bytree= 0.9, colsample_bylevel= 1, reg_lambda= 20 ,\
random_state=1 , seed= 1, importance_type='gain')
xgb_model_mpg.fit(mpg_train_X ,mpg_train_y , eval_set= [(mpg_test_X , mpg_test_y )], eval_metric= xgb_mape,\
early_stopping_rounds= 20)
[...]
82] validation_0-rmse:3.41167 validation_0-mape:0.085761
[83] validation_0-rmse:3.40828 validation_0-mape:0.085618
[84] validation_0-rmse:3.40087 validation_0-mape:0.085519
[85] validation_0-rmse:3.403 validation_0-mape:0.085631
[86] validation_0-rmse:3.39977 validation_0-mape:0.085711
[87] validation_0-rmse:3.39626 validation_0-mape:0.085739
[88] validation_0-rmse:3.40048 validation_0-mape:0.085727
[89] validation_0-rmse:3.40356 validation_0-mape:0.085883
[90] validation_0-rmse:3.40341 validation_0-mape:0.085664
Stopping. Best iteration:
[70] validation_0-rmse:3.42186 validation_0-mape:0.085076
XGBRegressor(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bytree=0.9, gamma=0, importance_type='gain',
learning_rate=0.1, max_delta_step=0, max_depth=10,
min_child_weight=1, missing=None, n_estimators=1000, n_jobs=1,
nthread=None, objective='reg:linear', random_state=1, reg_alpha=0,
reg_lambda=20, scale_pos_weight=1, seed=1, silent=True,
subsample=0.6)
y_pred = xgb_model_mpg.predict(mpg_test_X)
results = pd.DataFrame({'actual':mpg_test_y, 'predictions' : y_pred})
results['Absolute_Percent_Error'] = 100 * np.abs(results['actual'] - results['predictions'])/results['actual']
MAPE = results['Absolute_Percent_Error'].mean()
MAPE
8.982732737486339
So in this case during the training I get a MAPE of 8.5% and when applying the model to the same test set I get a MAPE close to 9%.
As I said in other examples with larger and more complex datasets the differences can be much larger, e.g. 41% vs. 58%.
解决方案
There are two different issues here. One small: you have slightly different definition of the evaluation function in xgb training and outside (there is +1
in the denominator in the xgb evaluation). One more significant issue: xgboost
(in contrast to lightgbm
) by default calculates predictions using all trained trees instead of the best number of trees. To get the optimal number of trees in the prediction use y_pred = xgb_model_mpg.predict(mpg_test_X, ntree_limit=xgb_model_mpg.best_ntree_limit)
推荐阅读
- svg - 创建用于导入 Three.JS 的 SVG 以使其符合真正的 SVG 设计时遗漏了哪一步?
- c# - How to prevent default parameter from overwriting an assigned value?
- python-3.x - 使用 RE 在 python 中验证电话号码
- php - 如何将同一月份的日期显示为单个链接?
- c++ - 如何在不递归的情况下找到所有可能的字谜?
- sockets - Netcat 响应套接字连接
- python - Pandas:枚举 lambda 函数与 pd.merge 无法解包元组值
- google-bigquery - 选择记录中的单个字段
- python - 假设我想实现一个 InsertOnlyDict,实现它的最佳方法是什么?
- c# - 覆盖 BuildRenderTree 时是否可以获得 RenderFragments 列表?