python - 为什么 xgboost 的节点增益输出与手动计算的不同?
问题描述
xgboost树结构我们可以从trees_to_dataframe()
:</p>
import numpy as np
import pandas as pd
import xgboost as xgb
from sklearn.datasets import load_boston
data = load_boston()
X = pd.DataFrame(data.data, columns=data.feature_names)
y = pd.Series(data.target)
model = xgb.XGBRegressor(random_state=1,
n_estimators=1, # 只有一棵树
max_depth=2,
learning_rate=0.1
)
model.fit(X, y)
tree_frame = model._Booster.trees_to_dataframe()
tree_frame
其中,根据SO线程如何计算xgboost质量?,增益应通过以下方式计算:
但是,它与此代码不同:
def mse_obj(preds, labels):
grad = labels-preds
hess = np.ones_like(labels)
return grad, hess
Gain,Hessian = mse_obj(y.mean(),y)
L = X[tree_frame['Feature'][0]] < tree_frame['Split'][0]
R = X[tree_frame['Feature'][0]] >= tree_frame['Split'][0]
GL = Gain[L].sum()
GR = Gain[R].sum()
HL = Hessian[L].sum()
HR = Hessian[R].sum()
reg_lambda = 1.0
gain = (GL**2/(HL+reg_lambda)+GR**2/(HR+reg_lambda)-(GL+GR)**2/(HL+HR+reg_lambda))
gain # 18817.811191871013
L = (X[tree_frame['Feature'][0]] < tree_frame['Split'][0])&((X[tree_frame['Feature'][1]] < tree_frame['Split'][1]))
R = (X[tree_frame['Feature'][0]] < tree_frame['Split'][0])&((X[tree_frame['Feature'][1]] >= tree_frame['Split'][1]))
GL = Gain[L].sum()
GR = Gain[R].sum()
HL = Hessian[L].sum()
HR = Hessian[R].sum()
reg_lambda = 1.0
gain = (GL**2/(HL+reg_lambda)+GR**2/(HR+reg_lambda)-(GL+GR)**2/(HL+HR+reg_lambda))
gain # 7841.627971119211
L = (X[tree_frame['Feature'][0]] > tree_frame['Split'][0])&((X[tree_frame['Feature'][2]] < tree_frame['Split'][2]))
R = (X[tree_frame['Feature'][0]] > tree_frame['Split'][0])&((X[tree_frame['Feature'][2]] >= tree_frame['Split'][2]))
GL = Gain[L].sum()
GR = Gain[R].sum()
HL = Hessian[L].sum()
HR = Hessian[R].sum()
reg_lambda = 1.0
gain = (GL**2/(HL+reg_lambda)+GR**2/(HR+reg_lambda)-(GL+GR)**2/(HL+HR+reg_lambda))
gain # 2634.409414953051
我错过了什么?
解决方案
最终我发现了我错在哪里。定义的默认预测值base_score
是,在计算每个样本的梯度时0.5
,我们应该在构建任何树之前作为模型的预测值。base_score
Gain,Hessian = mse_obj(model.get_params()['base_score'], y)
在此之后,一切似乎都正常。
推荐阅读
- node.js - 从 h264 视频数据的部分流创建有效的 h264 并包装为 Mp4
- python - 将数据框转换为元组列表将 datetime.datetime 更改为 int
- apache-nifi - 自动配置 DistributedMapCacheLookupService 服务
- python - 如何在numpy中标准化?
- wordpress - WordPress 设置验证消息显示两次
- sql-server - 在 SSRS 中使用搜索功能而不是使用下拉列表
- javascript - 中继器控制中的确认模态框 Javascript(findcontrol 值出错)
- elasticsearch - 未定义函数的 function_score 查询
- r - DataTables - 禁用智能搜索
- ios - 当播放器较小时,AVPlayerView 是否请求较低分辨率的 HLS 视频?