首页 > 解决方案 > Sklearn FitFailedWarning:估计器拟合失败。ValueError: kth(=-9) 越界 (1)

问题描述

我目前正在研究关于 california-housing-prices 数据的示例教程,并且在尝试对数据准备选项进行网格搜索时遇到了这个问题。

代码

num_pipeline = Pipeline([
    ('imputer', SimpleImputer(strategy="median")),
    ('attribs_adder', CombinedAttributesAdder()),
    ('std_scaler', StandardScaler()),
])

full_pipeline = ColumnTransformer([
    ("num", num_pipeline, num_attribs),
    ("cat", OneHotEncoder(), cat_attribs)
])

def indices_of_top_k(arr, k):
    return np.sort(np.argpartition(np.array(arr), -k)[-k:])

class TopFeatureSelector(BaseEstimator, TransformerMixin):
    def __init__(self, feature_importances, k):
        self.ft_imp = feature_importances
        self.k = k
    def fit(self, X, y=None):
        self.feature_indices_ = indices_of_top_k(self.ft_imp, k)
        return self
    def transform(self, X):
        return X[:, self.feature_indices_]

prepare_select_and_predict_pipeline = Pipeline([
    ('preparation', full_pipeline),
    ('feature_selection', TopFeatureSelector(feature_importances, k)),
    ('forest_reg', RandomForestRegressor(**grid_search.best_params_))
])

param_grid = {
    'preparation__num__imputer__strategy': ['mean', 'median'],
    'feature_selection__k': list(range(1, len(feature_importances) + 1))
}

grid_search_prep = GridSearchCV(prepare_select_and_predict_pipeline, param_grid, cv=5, scoring='neg_mean_squared_error', verbose=1)

grid_search_prep.fit(housing, housing_labels)

错误

/opt/conda/lib/python3.7/site-packages/sklearn/base.py:213: FutureWarning: From version 0.24, get_params will raise an AttributeError if a parameter cannot be retrieved as an instance attribute. Previously it would return None.
  FutureWarning)
[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
/opt/conda/lib/python3.7/site-packages/sklearn/model_selection/_validation.py:552: FitFailedWarning: Estimator fit failed. The score on this train-test partition for these parameters will be set to nan. Details: 
Traceback (most recent call last):
  File "/opt/conda/lib/python3.7/site-packages/sklearn/model_selection/_validation.py", line 531, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "/opt/conda/lib/python3.7/site-packages/sklearn/pipeline.py", line 330, in fit
    Xt = self._fit(X, y, **fit_params_steps)
  File "/opt/conda/lib/python3.7/site-packages/sklearn/pipeline.py", line 296, in _fit
    **fit_params_steps[name])
  File "/opt/conda/lib/python3.7/site-packages/joblib/memory.py", line 352, in __call__
    return self.func(*args, **kwargs)
  File "/opt/conda/lib/python3.7/site-packages/sklearn/pipeline.py", line 740, in _fit_transform_one
    res = transformer.fit_transform(X, y, **fit_params)
  File "/opt/conda/lib/python3.7/site-packages/sklearn/base.py", line 693, in fit_transform
    return self.fit(X, y, **fit_params).transform(X)
  File "<ipython-input-129-8ca16960996c>", line 11, in fit
    self.feature_indices_ = indices_of_top_k(self.ft_imp, k)
  File "<ipython-input-129-8ca16960996c>", line 4, in indices_of_top_k
    return np.sort(np.argpartition(np.array(arr), -k)[-k:])
  File "<__array_function__ internals>", line 6, in argpartition
  File "/opt/conda/lib/python3.7/site-packages/numpy/core/fromnumeric.py", line 832, in argpartition
    return _wrapfunc(a, 'argpartition', kth, axis=axis, kind=kind, order=order)
  File "/opt/conda/lib/python3.7/site-packages/numpy/core/fromnumeric.py", line 58, in _wrapfunc
    return bound(*args, **kwds)
ValueError: kth(=-9) out of bounds (1)

...

  FitFailedWarning)
/opt/conda/lib/python3.7/site-packages/sklearn/base.py:213: FutureWarning: From version 0.24, get_params will raise an AttributeError if a parameter cannot be retrieved as an instance attribute. Previously it would return None.
  FutureWarning)

数据

外壳是 (16512, 9) 数据框外壳标签是 (16512,) 系列

知道为什么会发生此错误吗?

编辑:feature_importances 定义

param_grid = [
    {'n_estimators': [3,10,30], 'max_features': [2,4,6,8,10]},
    {'bootstrap':[False], 'n_estimators': [3,10], 'max_features':[2,3,4]},
]

forest_reg = RandomForestRegressor()

grid_search = GridSearchCV(forest_reg, param_grid, cv=5, scoring = 'neg_mean_squared_error', return_train_score=True)

grid_search.fit(housing_prepared, housing_labels)

feature_importances = grid_search.best_estimator_.feature_importances_

标签: pythonpandasnumpyscikit-learn

解决方案


推荐阅读