python - 无法从 sklearn 模型中获取特征名称,因为输入是 numpy 数组。如何构建我的代码以便提取功能名称?
问题描述
我正在研究使用分层 k 折交叉验证的随机森林分类模型。我想绘制每个折叠的特征重要性。我的输入数据采用 numpy 数组的形式,但是我无法将功能名称放在下面的代码中。如何构建此代码以便我可以提取功能名称,以便绘制内置功能的重要性?
import numpy as np
import pandas as pd
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split, KFold, cross_validate, cross_val_score, StratifiedKFold, RandomizedSearchCV
from sklearn.metrics import classification_report, confusion_matrix, f1_score, mean_squared_error
import matplotlib.pyplot as plt
y_downsample = downsampled[['dependent_variable']].values
X_downsample = downsampled[['Feature1'
,'Feature2'
,'Feature3'
,'Feature4'
,'Feature5'
,'Feature6'
,'Feature7'
,'Feature8'
,'Feature9'
,'Feature10']].values
skf = StratifiedKFold(n_splits=10, shuffle=True, random_state=1)
f1_results = []
accuracy_results = []
precision_results = []
recall_results = []
feature_imp = []
for train_index, test_index in skf.split(X_downsample,y_downsample):
X_train, X_test = X_downsample[train_index], X_downsample[test_index]
y_train, y_test = y_downsample[train_index], y_downsample[test_index]
model = RandomForestClassifier(n_estimators = 100, random_state = 24)
model.fit(X_train, y_train.ravel())
y_pred = model.predict(X_test)
f1_results.append(metrics.f1_score(y_test, y_pred))
accuracy_results.append(metrics.accuracy_score(y_test, y_pred))
precision_results.append(metrics.precision_score(y_test, y_pred))
recall_results.append(metrics.recall_score(y_test, y_pred))
# plot
importances = pd.DataFrame({'FEATURE':pd.DataFrame(X_downsample.columns),'IMPORTANCE':np.round(model.feature_importances_,3)})
importances = importances.sort_values('IMPORTANCE',ascending=False).set_index('FEATURE')
importances.plot.bar()
plt.show()
print("Accuracy: ", np.mean(accuracy_results))
print("Precision: ", np.mean(precision_results))
print("Recall: ", np.mean(recall_results))
print("F1-score: ", np.mean(f1_results))
-------------------------------------------------- ------------------------- AttributeError Traceback (most recent call > last) in > 21 > 22 # plot > ---> 23 重要性 = pd .DataFrame({'FEATURE':pd.DataFrame(X_downsample.columns),'IMPORTANCE':np.round(model.feature_importances_,3)}) > 24 重要性 = 重要性.sort_values('IMPORTANCE',ascending=False)。 set_index('FEATURE') > 25 > > AttributeError: 'numpy.ndarray' 对象没有属性 'columns'
解决方案
推荐阅读
- go - Go 中的并发
- visual-studio-code - 如何在 VS Code WebviewPanel 销毁后将其保持状态?
- sql - 更新查询复制源到目标表访问 2010
- mysql - 计算两个具有子查询的表的联合
- prolog - 在不知道节点的情况下查找关系的所有路径 Prolog
- swift - 擦除导航栏和searchBar swift 4之间的边框
- c# - dotnet 工具 aspnet-codegenerator 在错误的路径中查找可执行文件
- javascript - 检测用户何时试图滚动超出组件的底部,即使该组件没有滚动条
- javascript - 提交表单到mysql
- java - 在两个类之间通信增量器