首页 > 解决方案 > RandomOverSampler 条件创建均等分布

问题描述

我目前正在从事一个基于 ML 的项目,我的数据存在轻微的不平衡,需要过度采样技术。特征 (X_train) 维度是 (90664, 190),目标 (Y_binary_train_trans) 是 (90664, )。但是,代码运行并仍然输出相同的、不均等的目标分布。这是用于 RandomOverSampler 的代码,它也已尝试使用 smote;

counter= Counter(Y_binary_train_trans)
ros= RandomOverSampler(random_state=42)
X_train, Y_binary_train_trans = ros.fit_resample(X_train,Y_binary_train_trans)
counter = Counter(Y_binary_test_trans)

标签: machine-learningscikit-learnimblearn

解决方案


counter= Counter(Y_binary_train_trans)
ros= RandomOverSampler(random_state=42)
X_train, Y_binary_train_trans = ros.fit_resample(X_train,Y_binary_train_trans)
counter = Counter(Y_binary_test_trans)

至于此代码,您的第二个计数器计算的是测试样本,而不是您实际更改的训练样本!

相反,它应该是:

counter= Counter(Y_binary_train_trans)
ros= RandomOverSampler(random_state=42)
X_train, Y_binary_train_trans = ros.fit_resample(X_train,Y_binary_train_trans)
counter = Counter(Y_binary_train_trans)

推荐阅读