首页 > 解决方案 > 如何为随机森林设置 class_weight 字典?

问题描述

我正在处理一个不平衡的数据集,所以我决定使用权重字典进行分类。

文档说必须定义权重字典,如下所示: https ://imbalanced-learn.org/stable/generated/imblearn.ensemble.BalancedRandomForestClassifier.html

     weight_dict = [{0: 1, 1: 1}, {0: 1, 1: 5}, {0: 1, 1: 1}, {0: 1, 1: 1}]

所以,因为我想预测位于最后一列的 12 个类。我假设设置如下:

weight_dict = [{0: 1, 1: 5.77390289e-01}, {0: 1, 1: 6.48317326e-01}, 
               {0: 1, 1: 1.35324885e-01}, {0: 1, 1: 2.92665797e+00}, 
               {0: 1, 1: 5.77858906e+01}, {0: 1, 1: 1.73193507e+00},
               {0: 1, 1: 9.27828244e+00}, {0: 1, 1: 1.18766082e+01}, 
               {0: 1, 1: 8.99009985e+01}, {0: 1, 1: 6.39833279e+00}, 
               {0: 1, 1: 2.55347077e+01}, {0: 1, 1: 9.47015372e+02}]

老实说,我不清楚第一个指标的符号,我的意思是:

      0:1 of {0: 1, 1: 1} 

或者:

 1: value.

它们代表列位置、标签顺序吗?

设置它的正确方法是什么?

我会很感激你的见解。

标签: classificationrandom-forestimbalanced-dataimblearn

解决方案


我不清楚第一个指标的符号0:1 of {0: 1, 1: 1}

The notation is {<class label> : <count>}. The class label is in its original (ie. untransformed) representation.

For example, the following would order the generation of an Iris training set that contains 25 samples of "setosa", and 50 samples of "versicolor" and "virginica" each:

weight_dict = {"setosa" : 25, "versicolor" : 50, "virginica" : 50}

推荐阅读