首页 > 解决方案 > 如何理解 Xgboost 模型转储

问题描述

注意到 spark xgboost 没有trees_to_dataframe()Python API 中的 API,我正在尝试解析getModelDump结果,但我对它的格式感到困惑,哪些字段代表什么等。

 // train xgb_model in spark version of xgboost
scala> xgb_model
res18: ml.dmlc.xgboost4j.scala.spark.XGBoostClassificationModel = xgbc_89286dd04aa3

scala> xgb_model.nativeBooster.getModelDump(null, true);
res19: Array[String] =
Array("0:[f1<53] yes=1,no=2,missing=2,gain=58047.7812,cover=336165
        1:[f3<53.9500008] yes=3,no=4,missing=3,gain=24677.3848,cover=63748.25
                3:leaf=-0.0531237721,cover=53626.5
                4:leaf=0.031994272,cover=10121.75
        2:[f16<1.66669905] yes=5,no=6,missing=6,gain=10181.9785,cover=272416.75
                5:leaf=-0.0937986076,cover=268367
                6:leaf=-0.0139159411,cover=4049.75
", "0:[f1<51] yes=1,no=2,missing=2,gain=52816.4062,cover=336097.594
        1:[f8<369.570007] yes=3,no=4,missing=4,gain=22681.3555,cover=60529.668
                3:leaf=-0.0121749714,cover=37363.5625
                4:leaf=-0.0751453713,cover=23166.1055
        2:[f16<1.67979908] yes=5,no=6,missing=6,gain=10274.8359,cover=275567.906
                5:leaf=-0.089068912,cover=271300.188
                6:leaf=-0.0108754979,cover=4267.74268
", "0:[f1<56] yes=1,no=2,missing=2,gain=4887...

scala> res19.size
res20: Int = 200

我的模型参数设置如下:

xgbParams = {'n_estimators': 200, 'max_depth': 2, 'eta': 0.05, 'lambda':1, 'gamma':4, 'alpha':0.1, 'subsample':0.8, #'min_child_weight': 1,
         'colsample_bytree':0.8, 'objective': 'binary:logistic', 'colsample_bylevel':0.8,
         'eval_metric':'logloss', 'seed': 1122, 'missing': -999999999}

我认为res19.size= 200 有意义,因为我已设置n_estimators为 200。我对 中的每个字符串都感到困惑,所有字符串的res19格式如下:我认为f2必须代表某些特定功能,但我怎样才能找到示例功能名称?另外,0, 1,2代表什么?是什么yes=3, no=4意思?

提前致谢 !!

0:[f2<0.380098999] yes=1,no=2,missing=1,gain=732.850342,cover=72529.7266
        1:[f47<31.9999981] yes=3,no=4,missing=3,gain=753.887451,cover=67352.3594
                3:leaf=4.21585646e-05,cover=63820.7422
                4:leaf=0.0237709191,cover=3531.61987
        2:[f4<1050] yes=5,no=6,missing=6,gain=410.277802,cover=5177.3667
                5:leaf=0.00518732425,cover=1373.32422
                6:leaf=-0.0266880095,cover=3804.04224

标签: apache-sparkxgboostxgbclassifier

解决方案


推荐阅读