首页 > 解决方案 > Python 中 xgb.train 和 xgb.XGBRegressor 之间的结果不匹配

问题描述

我注意到这里讨论的 Python 中有两种可能的 XGBoost 实现

当我尝试通过两种可能的实现运行相同的数据集时,我注意到结果不同。

使用低级 API - xgboost.train(..)

dtrain = xgboost.DMatrix(X, label=Y, missing=0.0)
param = {'max_depth' : 3, 'objective' : 'reg:squarederror', 'booster' : 'gbtree'}
evallist = [(dtrain, 'eval'), (dtrain, 'train')]
num_round = 10
xgb_dMatrix = xgboost.train(param, dtrain, num_round, evallist)

输出

[0] eval-rmse:7115.31   train-rmse:7115.31
[1] eval-rmse:5335.37   train-rmse:5335.37
[2] eval-rmse:4054.77   train-rmse:4054.77
[3] eval-rmse:3140.91   train-rmse:3140.91
[4] eval-rmse:2510.33   train-rmse:2510.33
[5] eval-rmse:2080.62   train-rmse:2080.62
[6] eval-rmse:1785.53   train-rmse:1785.53
[7] eval-rmse:1571.92   train-rmse:1571.92
[8] eval-rmse:1399.57   train-rmse:1399.57
[9] eval-rmse:1301.64   train-rmse:1301.64

使用 Scikit 包装器 - xgboost.XGBRegressor(..)

xgb_reg = xgboost.XGBRegressor(max_depth=3, n_estimators=10)
xgb_reg.fit(X_train, Y_train, eval_set = [(X_train, Y_train), (X_train, Y_train)], eval_metric = 'rmse', verbose=True)

输出

[0] validation_0-rmse:8827.63   validation_1-rmse:8827.63
[1] validation_0-rmse:8048.16   validation_1-rmse:8048.16
[2] validation_0-rmse:7349.83   validation_1-rmse:7349.83
[3] validation_0-rmse:6720.69   validation_1-rmse:6720.69
[4] validation_0-rmse:6154.82   validation_1-rmse:6154.82
[5] validation_0-rmse:5637.49   validation_1-rmse:5637.49
[6] validation_0-rmse:5173.9    validation_1-rmse:5173.9
[7] validation_0-rmse:4759.14   validation_1-rmse:4759.14
[8] validation_0-rmse:4386.29   validation_1-rmse:4386.29
[9] validation_0-rmse:4051.63   validation_1-rmse:4051.63

我认为参数是导致差异的原因,所以我从 scikit 包装器实现中获取参数并将其传递给低级 API 实现,但仍然观察到结果不同。 参数代码

xgb_reg.get_params()

只是想知道内部相似的两个版本之间结果不匹配的可能原因是什么?

标签: pythonmachine-learningscikit-learnregressionxgboost

解决方案


推荐阅读