首页 > 解决方案 > 详尽的特征选择

问题描述

我正在尝试使用详尽的特征选择为我的模型选择最佳特征,但我得到了一个 IndexError 并且我一直在试图找出它。

X = train.columns.difference(['Customer_ID', 'total_claim_amount'])
y = train['total_claim_amount']
# exhaustive feature selection
from mlxtend.feature_selection import ExhaustiveFeatureSelector as EFS
from sklearn.ensemble import RandomForestRegressor

feature_selector = EFS(rf_model,min_features=2, max_features=23, scoring='accuracy', print_progress=True, cv=5)

feature_selector.fit(X, y)

我得到的错误是:

---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-45-6c2f49995b6c> in <module>
      7 feature_selector = EFS(rf_model,min_features=2, max_features=23, scoring='accuracy', print_progress=True, cv=5)
      8 
----> 9 feature_selector.fit(X, y)
     10 
     11 print('Best R2 score: %.2f' % feature_selector.best_score_ * (-1))

~\anaconda3\lib\site-packages\mlxtend\feature_selection\exhaustive_feature_selector.py in fit(self, X, y, custom_feature_names, groups, **fit_params)
    225 
    226         if (not isinstance(self.max_features, int) or
--> 227                 (self.max_features > X.shape[1] or self.max_features < 1)):
    228             raise AttributeError('max_features must be'
    229                                  ' smaller than %d and larger than 0' %

IndexError: tuple index out of range

标签: pythonmachine-learningscikit-learn

解决方案


最后一条轨迹的行中唯一的元组索引是X.shape[1],这表明您X只是一维的。确实,

train.columns.difference(...)

只给出列名列表,而不是仅限于这些列的数据框。你要

X = train[train.columns.difference(...)]

(或者可能train.drop(...)更容易解析)。


推荐阅读