python - Sklearn FitFailedWarning:估计器拟合失败。ValueError: kth(=-9) 越界 (1)
问题描述
我目前正在研究关于 california-housing-prices 数据的示例教程,并且在尝试对数据准备选项进行网格搜索时遇到了这个问题。
代码
num_pipeline = Pipeline([
('imputer', SimpleImputer(strategy="median")),
('attribs_adder', CombinedAttributesAdder()),
('std_scaler', StandardScaler()),
])
full_pipeline = ColumnTransformer([
("num", num_pipeline, num_attribs),
("cat", OneHotEncoder(), cat_attribs)
])
def indices_of_top_k(arr, k):
return np.sort(np.argpartition(np.array(arr), -k)[-k:])
class TopFeatureSelector(BaseEstimator, TransformerMixin):
def __init__(self, feature_importances, k):
self.ft_imp = feature_importances
self.k = k
def fit(self, X, y=None):
self.feature_indices_ = indices_of_top_k(self.ft_imp, k)
return self
def transform(self, X):
return X[:, self.feature_indices_]
prepare_select_and_predict_pipeline = Pipeline([
('preparation', full_pipeline),
('feature_selection', TopFeatureSelector(feature_importances, k)),
('forest_reg', RandomForestRegressor(**grid_search.best_params_))
])
param_grid = {
'preparation__num__imputer__strategy': ['mean', 'median'],
'feature_selection__k': list(range(1, len(feature_importances) + 1))
}
grid_search_prep = GridSearchCV(prepare_select_and_predict_pipeline, param_grid, cv=5, scoring='neg_mean_squared_error', verbose=1)
grid_search_prep.fit(housing, housing_labels)
错误
/opt/conda/lib/python3.7/site-packages/sklearn/base.py:213: FutureWarning: From version 0.24, get_params will raise an AttributeError if a parameter cannot be retrieved as an instance attribute. Previously it would return None.
FutureWarning)
[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
/opt/conda/lib/python3.7/site-packages/sklearn/model_selection/_validation.py:552: FitFailedWarning: Estimator fit failed. The score on this train-test partition for these parameters will be set to nan. Details:
Traceback (most recent call last):
File "/opt/conda/lib/python3.7/site-packages/sklearn/model_selection/_validation.py", line 531, in _fit_and_score
estimator.fit(X_train, y_train, **fit_params)
File "/opt/conda/lib/python3.7/site-packages/sklearn/pipeline.py", line 330, in fit
Xt = self._fit(X, y, **fit_params_steps)
File "/opt/conda/lib/python3.7/site-packages/sklearn/pipeline.py", line 296, in _fit
**fit_params_steps[name])
File "/opt/conda/lib/python3.7/site-packages/joblib/memory.py", line 352, in __call__
return self.func(*args, **kwargs)
File "/opt/conda/lib/python3.7/site-packages/sklearn/pipeline.py", line 740, in _fit_transform_one
res = transformer.fit_transform(X, y, **fit_params)
File "/opt/conda/lib/python3.7/site-packages/sklearn/base.py", line 693, in fit_transform
return self.fit(X, y, **fit_params).transform(X)
File "<ipython-input-129-8ca16960996c>", line 11, in fit
self.feature_indices_ = indices_of_top_k(self.ft_imp, k)
File "<ipython-input-129-8ca16960996c>", line 4, in indices_of_top_k
return np.sort(np.argpartition(np.array(arr), -k)[-k:])
File "<__array_function__ internals>", line 6, in argpartition
File "/opt/conda/lib/python3.7/site-packages/numpy/core/fromnumeric.py", line 832, in argpartition
return _wrapfunc(a, 'argpartition', kth, axis=axis, kind=kind, order=order)
File "/opt/conda/lib/python3.7/site-packages/numpy/core/fromnumeric.py", line 58, in _wrapfunc
return bound(*args, **kwds)
ValueError: kth(=-9) out of bounds (1)
...
FitFailedWarning)
/opt/conda/lib/python3.7/site-packages/sklearn/base.py:213: FutureWarning: From version 0.24, get_params will raise an AttributeError if a parameter cannot be retrieved as an instance attribute. Previously it would return None.
FutureWarning)
数据
外壳是 (16512, 9) 数据框外壳标签是 (16512,) 系列
知道为什么会发生此错误吗?
编辑:feature_importances 定义
param_grid = [
{'n_estimators': [3,10,30], 'max_features': [2,4,6,8,10]},
{'bootstrap':[False], 'n_estimators': [3,10], 'max_features':[2,3,4]},
]
forest_reg = RandomForestRegressor()
grid_search = GridSearchCV(forest_reg, param_grid, cv=5, scoring = 'neg_mean_squared_error', return_train_score=True)
grid_search.fit(housing_prepared, housing_labels)
feature_importances = grid_search.best_estimator_.feature_importances_
解决方案
推荐阅读
- docker - 如何设置与 docker 容器一起使用的 mysql 主机限制
- azure - 将在线 Azure 函数移动到本地 git 存储库的最佳方法是什么?
- node.js - 如何在有内容的节点 js 中创建全局搜索 api?
- javascript - 如何使用 rxjs 和 redux-observable 等待或监听 Url 的变化?
- python - 如何从文本文件创建字典并在 python 中添加值?
- java - Spring 创建两个 @Configuration bean 启动
- sql-server - (SQL) 服务器代理作业未停止
- python - Python:将库作为变量导入
- ffmpeg - 单个 FFMpeg 命令将 2 个音频与 1 个视频合并
- android - 用于长屏的 Android Layout 文件夹