首页 > 解决方案 > 在管道中的 XGBoostClassifier() 中使用“class_weight”进行多类分类的正确方法

问题描述

我正在研究用于分类的严重不平衡的多类数据。我想使用class_weight许多scikit-learn模型中给出的。在管道内执行此操作的最佳和正确方法是什么。

正如我在文档中看到的,scale_pos_weight仅用于二进制分类。 这个答案在这里得到了“Firas Omrane”的 15 个赞成票,给出了一些想法,所以我使用了

classes_weights = list(class_weight.compute_class_weight('balanced',
                                             classes = np.unique(y_train),
                                             y = y_train))

weights = np.ones(y_train.shape[0], dtype = 'float')
for i, val in enumerate(y_train):
    weights[i] = classes_weights[val-1]


XGBClassifier().fit(x_train, y_train, sample_weight=weights)

它可以很好地与fit管道一起使用,但在用作:

('clf',XGBClassifier(class_weight = 'balanced', n_jobs = -1,objective = 'multi:softprob', sample_weight = classes_weights, )) # last step of the pipeline

它给出的错误如下:

('clf',XGBClassifier(class_weight = 'balanced', n_jobs = -1,objective = 'multi:softprob', sample_weight = classes_weights, )) # last step of Pipeline

WARNING: /tmp/build/80754af9/xgboost-split_1619724447847/work/src/learner.cc:541: 
Parameters: { class_weight, sample_weight } might not be used.

  This may not be accurate due to some parameters are only used in language bindings but
  passed down to XGBoost core.  Or some parameters are not used but slip through this
  verification. Please open an issue if you find above cases.

标签: pythonmachine-learningscikit-learnxgboostxgbclassifier

解决方案


推荐阅读