python - scikit -learn 管道 (SVC) 的特征重要性

问题描述

我有以下管道，我想获得每个类的功能。我有三个课程（“小说”、“非小说”、“无”）。我使用的分类器是SVC.

Book_contents= Pipeline([('selector', ItemSelector(key='Book')),
                         ('tfidf',CountVectorizer(analyzer='word',
                                                  binary=True,
                                                  ngram_range=(1,1))),
                        ])

Author_description= Pipeline([('selector', ItemSelector(key='Description')),
                              ('tfidf', CountVectorizer(analyzer='word',
                                                        binary=True,
                                                        ngram_range=(1,1))),
                             ])

ppl = Pipeline([('feats', FeatureUnion([('Contents',Book_contents),
                                        ('Desc',Author_description)])),
                ('clf', SVC(kernel='linear',class_weight='balanced'))
               ])

model = ppl.fit(training_data, Y_train)

我已经尝试过 eli5，但出现功能名称和分类器不匹配的错误。

f1=model.named_steps['feats'].transformer_list[0][1].named_steps['tfidf'].get_feature_names()
f2=model.named_steps['feats'].transformer_list[1][1].named_steps['tfidf'].get_feature_names()
    list_features=f1
list_features.append(f2)
explain_weights.explain_linear_classifier_weights(model.named_steps['clf'], 
                                              vec=None, top=20, 
                                              target_names=ppl.classes_, 
                                              feature_names=list_features)

我收到了这个错误：

feature_names 的长度错误：expected=47783, got=10528

如何获得每个类的特征权重的排名？他们是没有eli5的一种方法吗？

标签： pythonscikit-learnsvm

除了这一行之外，您所做的一切都是正确的：

list_features.append(f2)

在这里，您将整个f2列表作为一个元素附加到f1列表中。这不是你想要的。

您想将 f2 的所有元素添加到 f1。为此，您需要使用extend. 只需这样做：

list_features.extend(f2)

有关更多详细信息，请参阅此问题：

Python中追加与扩展列表方法之间的区别

除此之外，我认为你打电话的方式explain_weights.explain_linear_classifier_weights是错误的。您只需要调用explain_weights(...)它，它就会自动在内部调用explain_linear_classifier_weights.

python - scikit -learn 管道 (SVC) 的特征重要性

问题描述

解决方案

推荐阅读