首页 > 解决方案 > ValueError:找到具有 1 个特征的数组,而至少需要 2 个特征

问题描述

我将其他 ML 模型中的随机森林 RFECV 应用于流失数据集。虽然 Logistic、SVC、Gradient Boosting、Decision Trees 在数据上运行良好(均使用 RFECV),但随机森林 RFECV 认为只有一个特征很重要,并消除了所有其他特征。代码:

#Create Feature variable X and Target variable y
y = churn_dataset['Churn']
X = churn_dataset.drop(['Churn'], axis = 1)

#RFECV
rfecv = RFECV(RandomForestClassifier(), cv=10, scoring='f1')
rfecv = rfecv.fit(X, y)
print('Optimal number of features :', rfecv.n_features_)
print('Best features :', X.columns[rfecv.support_])
print(np.where(rfecv.support_ == False)[0])

#drop columns
X.drop(X.columns[np.where(rfecv.support_ == False)[0]], axis=1, inplace=True)
rfecv.estimator_.feature_importances_

#train test split
X_train, X_test, y_train, y_test = train_test_split(X, y, 
                                                    test_size=0.20, 
                                                    random_state=8)

#fit model
random_forest = rfecv.fit(X_train, y_train)

返回以下错误:

ValueError: Found array with 1 feature(s) (shape=(1622, 1)) while a minimum of 2 is required.

churn_dataset.head() 的输出

    name  gender churn last_purchase_in_days order_count  purchase_quantity ...
2   ACKLE   0   1   0.317604    -0.453647   2   -0.368683   1.173058    0.291104    0   ...     0   0   0   0   0   0   1   0   0   1.00
4   ADNAN   1   1   0.250814    -0.453647   2   -0.368683   -0.431351   -0.418023   0   ...     0   0   0   0   0   0   1   0   0   1.00
5   ADY     0   1   -1.143415   -0.453647   2   -0.368683   0.190767    -0.117630   0   ... 0   0   0   0   0   0   1   0   0   1.00
6   ANDY    0   1   0.768432    -0.453647   2   -0.368683   -0.752232   -0.559952   0   ... 0   0   0   0   0   0   1   0   0   1.00
7   AGIE    0   0   -1.669381   3.048875    8   -0.368683   0.520653    4.251851    0   ... 0   0   0   0   0   0   1   0   0   0.16

churn_dataset.columns

Index(['name', 'gender', 'Churn', 'last_purchase_in_days',
       'order_count', 'quantity', 'disc_code',
       'AOV', 'sales',
       'channel_Paid Advertising','channel_Recurring Payment', 
       'channel_Search Engine',
       'channel_Social Media', 'country_Denmark', 'country_France', 
        'country_Germany', 'country_Italy',
       'country_Luxembourg', 'country_Others', 'country_Switzerland',
       'country_United Kingdom', 'city_Düsseldorf', 'city_Frankfurt',
       'city_Hamburg', 'city_Hannover', 'city_Köln', 'city_Leipzig',
       'city_Munich', 'city_Others', 'city_Stuttgart', 'city_Wien',
       'Probability_of_Churn'],
      dtype='object')

标签: pythonmachine-learningscikit-learnrandom-forestfeature-selection

解决方案


推荐阅读