首页 > 解决方案 > 无法从 sklearn 模型中获取特征名称,因为输入是 numpy 数组。如何构建我的代码以便提取功能名称?

问题描述

我正在研究使用分层 k 折交叉验证的随机森林分类模型。我想绘制每个折叠的特征重要性。我的输入数据采用 numpy 数组的形式,但是我无法将功能名称放在下面的代码中。如何构建此代码以便我可以提取功能名称,以便绘制内置功能的重要性?

        import numpy as np
        import pandas as pd
        from sklearn.ensemble import RandomForestClassifier
        from sklearn.model_selection import train_test_split, KFold, cross_validate, cross_val_score, StratifiedKFold, RandomizedSearchCV
        from sklearn.metrics import classification_report, confusion_matrix, f1_score, mean_squared_error
        import matplotlib.pyplot as plt

        y_downsample = downsampled[['dependent_variable']].values
        X_downsample = downsampled[['Feature1'
                                   ,'Feature2'
                                   ,'Feature3'
                                   ,'Feature4'
                                   ,'Feature5'
                                   ,'Feature6'
                                   ,'Feature7'
                                   ,'Feature8'
                                   ,'Feature9'
                                   ,'Feature10']].values
    
        skf = StratifiedKFold(n_splits=10, shuffle=True, random_state=1)
        f1_results = []
        accuracy_results = []
        precision_results = []
        recall_results = []
        feature_imp = []
        
        
        for train_index, test_index in skf.split(X_downsample,y_downsample):
                X_train, X_test = X_downsample[train_index], X_downsample[test_index]
                y_train, y_test = y_downsample[train_index], y_downsample[test_index]
        
                model = RandomForestClassifier(n_estimators = 100, random_state = 24)
                model.fit(X_train, y_train.ravel())
                y_pred = model.predict(X_test)
        
                f1_results.append(metrics.f1_score(y_test, y_pred))
                accuracy_results.append(metrics.accuracy_score(y_test, y_pred))
                precision_results.append(metrics.precision_score(y_test, y_pred))
                recall_results.append(metrics.recall_score(y_test, y_pred))
            
                # plot
                importances = pd.DataFrame({'FEATURE':pd.DataFrame(X_downsample.columns),'IMPORTANCE':np.round(model.feature_importances_,3)})
                importances = importances.sort_values('IMPORTANCE',ascending=False).set_index('FEATURE')
            
                importances.plot.bar()
                plt.show()
           
            
        print("Accuracy: ", np.mean(accuracy_results))
        print("Precision: ", np.mean(precision_results))
        print("Recall: ", np.mean(recall_results))
        print("F1-score: ", np.mean(f1_results))

-------------------------------------------------- ------------------------- AttributeError Traceback (most recent call > last) in > 21 > 22 # plot > ---> 23 重要性 = pd .DataFrame({'FEATURE':pd.DataFrame(X_downsample.columns),'IMPORTANCE':np.round(model.feature_importances_,3)}) > 24 重要性 = 重要性.sort_values('IMPORTANCE',ascending=False)。 set_index('FEATURE') > 25 > > AttributeError: 'numpy.ndarray' 对象没有属性 'columns'

标签: pythonnumpyscikit-learnrandom-forestfeature-selection

解决方案


推荐阅读