python - 使用 SelectFromModel 和 MultiOutputRegressor 进行多步回归的特征选择。如何获得选定的特征及其特征重要性?
问题描述
我想用它sklearn.feature_selection.SelectFromModel
来提取多步回归问题中的特征。MultiOutputRegressor
回归问题使用 与结合来预测多个值RandomForestRegressor
。当我尝试使用它获取所选功能时SelectFromModel.get_support()
,会出现错误,表明我需要使一些feature_importances_
可访问的方法才能正常工作。可以按照此问题中的说明访问feature_importances_
of 。但是我不确定如何正确地将这些传递给课堂。MultiOutputRegressor
feature_importances_
SelectFromModel
这是我到目前为止所做的:
# make sample data
from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
X, y = make_regression(n_samples=100, n_features=100, n_targets=5)
print(X.shape, y.shape)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=.2, shuffle=True)
# get important features for prediction problem:
from sklearn.multioutput import MultiOutputRegressor
regr_multirf = MultiOutputRegressor(RandomForestRegressor(n_estimators = 100))
regr_multirf = regr_multirf.fit(X_train, y_train)
sel = SelectFromModel(regr_multirf, max_features= int(np.floor(X_train.shape[1] / 3.)))
sel.fit(X_train, y_train)
sel.get_support()
# to get feature_importances_ of Multioutputregressor use line:
regr_multirf.estimators_[1].feature_importances_
输出:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-72-a1d635ad4a34> in <module>()
5 sel = SelectFromModel(regr_multirf, max_features= int(np.floor(X_train.shape[1] / 3.)))
6 sel.fit(X_train, y_train)
----> 7 sel.get_support()
2 frames
/usr/local/lib/python3.7/dist-packages/sklearn/feature_selection/_from_model.py in _get_feature_importances(estimator, norm_order)
30 "`feature_importances_` attribute. Either pass a fitted estimator"
31 " to SelectFromModel or call fit before calling transform."
---> 32 % estimator.__class__.__name__)
33
34 return importances
ValueError: The underlying estimator MultiOutputRegressor has no `coef_` or `feature_importances_` attribute. Either pass a fitted estimator to SelectFromModel or call fit before calling transform.
任何帮助和提示将不胜感激。
解决方案
在来自 sklearn 的 MultiOutputRegressors 中,每个目标都配备了自己的模型,如文档中所述:“此策略包括为每个目标拟合一个回归器。”。这意味着您需要计算 MultiOutputRegressor 中每个随机森林回归器的特征重要性。每个回归器的特征重要性不直接保存在 MultiOutputRegressor 中。regr_multirf.estimators_[# of regressor you want]
相反,您可以通过if regr_multirf
is your fit MultiOutputRegressor 从拟合的 MultiOutputRegressor 对象中提取每个回归量(或也称为估计量)
。
因此,您不需要SelectFromModel
检索 MultiOutput sklearn 回归模型的特征重要性,而是直接使用每个估计器,如本问题中所述,此答案也非常依赖于此。您的方法仅适用于本质上可以预测多变量目标并且不为每个目标训练单个模型的方法。
在您的情况下,您可以regr_multirf
通过拟合的回归器直接检索特征重要性
# make sample data
from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestRegressor
from sklearn.multioutput import MultiOutputRegressor
from sklearn.feature_selection import SelectFromModel
import numpy as np
import pandas as pd
X, y = make_regression(n_samples=100, n_features=100, n_targets=5)
print(X.shape, y.shape)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=.2, shuffle=True)
regr_multirf = MultiOutputRegressor(RandomForestRegressor(n_estimators = 100))
regr_multirf = regr_multirf.fit(X_train, y_train)
# now extract the estimator from your regression model
# this estimator carries the feature importances
# you're interested in
# You can also loop the following code
# over all your targets
no_est = 0 # index of target you want feature importance for
# get estimator
est = regr_multirf.estimators_[0]
# get feature importances
feature_importances = pd.DataFrame(est.feature_importances_,
columns=['importance']).sort_values('importance')
print(feature_importances)
feature_importances.plot(kind = 'barh')
输出:
推荐阅读
- android - Android 应用程序在长按编辑文本时崩溃
- mysql - VB.NET:如何修复给定路径的格式不支持错误
- ionic-framework - 离子底固定内容
- php - php和mysql上的INNER JOIN和foreach循环
- python - 从嵌套字典创建多个子字典
- c# - 桌面应用程序中的 SQL Server Windows 身份验证 - 连接错误
- asp.net - 为什么要创建 DBContext 的私有变量并将其初始化为类的公共构造函数?
- php - 502 Bad Gateway on fresh api-platform docker-compose
- apache - 即使在 ssl 设置之后,https 也不会提供内容,但 http 可以
- typescript - 使用新定义文件添加到现有库 typescript 类型