首页 > 解决方案 > XGBClassifier ValueError: 操作数不能与形状一起广播 (2557,) (8,) (2557,)

问题描述

我正在做一个文本分类项目。

在探索不同的分类器时,我遇到了XGBClassifier

我的分类任务是多类。我在尝试对分类器进行评分时遇到上述错误 - 我猜需要进行一些重塑,但我不明白为什么。对我来说奇怪的是其他分类器工作得很好(即使是这个带有默认参数的分类器)

这是我的代码中的相关部分:

algorithms = [    
    svm.LinearSVC(),  # <<<=== Works    
    linear_model.RidgeClassifier(), # <<<=== Works    
    XGBClassifier(),  # <<<=== Works    
    XGBClassifier(objective='multi:softprob', num_class=len(groups_count_dict), eval_metric='merror')  # <<<=== Not working
]

def train(algorithm, X_train, y_train):
    model = Pipeline([       
        ('vect', transformer),
        ('classifier', OneVsRestClassifier(algorithm))
    ])
    model.fit(X_train, y_train)

    return model

score_dict = {}
algorithm_to_model_dict = {}
for algorithm in algorithms:
    print()
    print(f'trying {algorithm}')
    model = train(algorithm, X_train, y_train)
    score = model.score(X_test, y_test)
    score_dict[algorithm] = int(score * 100)
    algorithm_to_model_dict[algorithm] = model
    
sorted_score_dict = {k: v for k, v in sorted(score_dict.items(), key=lambda item: item[1])}
for classifier, score in sorted_score_dict.items():
    print(f'{classifier.__class__.__name__}: score is {score}%')

再次出现错误:

ValueError: operands could not be broadcast together with shapes (2557,) (8,) (2557,)

不确定它是否相关,但无论如何我都会提到它 - 我transformer的创建是这样的:

tuples = []
tfidf_kwargs = {'ngram_range': (1, 2), 'stop_words': 'english', 'sublinear_tf': True}
for col in list(features.columns):
    tuples.append((f'vec_{col}', TfidfVectorizer(**tfidf_kwargs), col))

transformer = ColumnTransformer(tuples, remainder='passthrough')

提前致谢

编辑:

添加完整的跟踪:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-15-576cd62f3df0> in <module>
     84     print(f'trying {algorithm}')
     85     model = train(algorithm, X_train, y_train)
---> 86     score = model.score(X_test, y_test)
     87     score_dict[algorithm] = int(score * 100)
     88     algorithm_to_model_dict[algorithm] = model

/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/sklearn/utils/metaestimators.py in <lambda>(*args, **kwargs)
    118 
    119         # lambda, but not partial, allows help() to work with update_wrapper
--> 120         out = lambda *args, **kwargs: self.fn(obj, *args, **kwargs)
    121         # update the docstring of the returned function
    122         update_wrapper(out, self.fn)

/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/sklearn/pipeline.py in score(self, X, y, sample_weight)
    620         if sample_weight is not None:
    621             score_params['sample_weight'] = sample_weight
--> 622         return self.steps[-1][-1].score(Xt, y, **score_params)
    623 
    624     @property

/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/sklearn/base.py in score(self, X, y, sample_weight)
    498         """
    499         from .metrics import accuracy_score
--> 500         return accuracy_score(y, self.predict(X), sample_weight=sample_weight)
    501 
    502     def _more_tags(self):

/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/sklearn/multiclass.py in predict(self, X)
    365             for i, e in enumerate(self.estimators_):
    366                 pred = _predict_binary(e, X)
--> 367                 np.maximum(maxima, pred, out=maxima)
    368                 argmaxima[maxima == pred] = i
    369             return self.classes_[argmaxima]

ValueError: operands could not be broadcast together with shapes (2557,) (8,) (2557,) 

打印形状X_testy_test产量:(2557, 12) (2557,)

我能够理解(8,)来自哪里 - 它的长度groups_count_dict

标签: pythonmachine-learningscikit-learnnlp

解决方案


原来解决方案是OneVsRestClassifier从管道中删除使用


推荐阅读