python - 如何限制在决策树分类器的特征重要性图上绘制的特征数量?
问题描述
我正在评估我的决策树分类器,并且我正在尝试绘制特征重要性。该图可以正确打印,但它会打印所有(80 多个)特征,这会产生非常混乱的视觉效果。我试图弄清楚如何将绘图限制在重要的变量上,按重要性顺序排列。
数据集的链接供您下载到您的工作目录,命名为(“文件”):https ://github.com/Arsik36/Python
最小可重现代码:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
file = 'file.xlsx'
my_df = pd.read_excel(file)
# Determining response variable
my_df_target = my_df.loc[ :, 'Outcome']
# Determining explanatory variables
my_df_data = my_df.drop('Outcome', axis = 1)
# Declaring train_test_split with stratification
X_train, X_test, y_train, y_test = train_test_split(my_df_data,
my_df_target,
test_size = 0.25,
random_state = 331,
stratify = my_df_target)
# Declaring class weight
weight = {0: 455, 1:1831}
# Instantiating Decision Tree Classifier
decision_tree = DecisionTreeClassifier(max_depth = 5,
min_samples_leaf = 25,
class_weight = weight,
random_state = 331)
# Fitting the training data
decision_tree_fit = decision_tree.fit(X_train, y_train)
# Predicting on the test data
decision_tree_pred = decision_tree_fit.predict(X_test)
# Declaring the number of features in the X_train data
n_features = X_train.shape[1]
# Setting the plot window
figsize = plt.subplots(figsize = (12, 9))
# Specifying the contents of the plot
plt.barh(range(n_features), decision_tree_fit.feature_importances_, align = 'center')
plt.yticks(pd.np.arange(n_features), X_train.columns)
plt.xlabel("The degree of importance")
plt.ylabel("Feature")
解决方案
您需要修改所有绘图代码以删除低重要性功能,试试这个(未经测试):
# Setting the plot window
figsize = plt.subplots(figsize = (12, 9))
featues_mask = tree.feature_importances_> 0.005
# Specifying the contents of the plot
plt.barh(range(sum(featues_mask)), tree.feature_importances_[featues_mask], align = 'center')
plt.yticks(pd.np.arange(sum(featues_mask)), X_train.columns[featues_mask])
plt.xlabel("The degree of importance")
plt.ylabel("Feature")
推荐阅读
- c# - 如何在统一的 RenderTexture 中获得更好的颜色?
- javascript - Farbic SVG路径长度计算
- javascript - 自定义 Tensorflow 模型 - 在网站中使用网络摄像头进行对象检测
- python - 我试图用pyautogui在搜索器中写一些东西,但它没有写任何东西
- python - 测试我的 GCP Cloud 功能出现错误
- performance - 频繁的分段错误:作业似乎已经崩溃
- android - 为什么没有在使用 espresso 的 Android Instrumentation 测试(活动测试)中启动活动?
- azure - 雪花中的管道通知绑定失败
- php - 如何使用关系创建数据?
- r - 向 R 命令添加条件子句,以更改个人所在的组