python - XGBClassifier ValueError: 操作数不能与形状一起广播 (2557,) (8,) (2557,)
问题描述
我正在做一个文本分类项目。
在探索不同的分类器时,我遇到了XGBClassifier
我的分类任务是多类。我在尝试对分类器进行评分时遇到上述错误 - 我猜需要进行一些重塑,但我不明白为什么。对我来说奇怪的是其他分类器工作得很好(即使是这个带有默认参数的分类器)
这是我的代码中的相关部分:
algorithms = [
svm.LinearSVC(), # <<<=== Works
linear_model.RidgeClassifier(), # <<<=== Works
XGBClassifier(), # <<<=== Works
XGBClassifier(objective='multi:softprob', num_class=len(groups_count_dict), eval_metric='merror') # <<<=== Not working
]
def train(algorithm, X_train, y_train):
model = Pipeline([
('vect', transformer),
('classifier', OneVsRestClassifier(algorithm))
])
model.fit(X_train, y_train)
return model
score_dict = {}
algorithm_to_model_dict = {}
for algorithm in algorithms:
print()
print(f'trying {algorithm}')
model = train(algorithm, X_train, y_train)
score = model.score(X_test, y_test)
score_dict[algorithm] = int(score * 100)
algorithm_to_model_dict[algorithm] = model
sorted_score_dict = {k: v for k, v in sorted(score_dict.items(), key=lambda item: item[1])}
for classifier, score in sorted_score_dict.items():
print(f'{classifier.__class__.__name__}: score is {score}%')
再次出现错误:
ValueError: operands could not be broadcast together with shapes (2557,) (8,) (2557,)
不确定它是否相关,但无论如何我都会提到它 - 我transformer
的创建是这样的:
tuples = []
tfidf_kwargs = {'ngram_range': (1, 2), 'stop_words': 'english', 'sublinear_tf': True}
for col in list(features.columns):
tuples.append((f'vec_{col}', TfidfVectorizer(**tfidf_kwargs), col))
transformer = ColumnTransformer(tuples, remainder='passthrough')
提前致谢
编辑:
添加完整的跟踪:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-15-576cd62f3df0> in <module>
84 print(f'trying {algorithm}')
85 model = train(algorithm, X_train, y_train)
---> 86 score = model.score(X_test, y_test)
87 score_dict[algorithm] = int(score * 100)
88 algorithm_to_model_dict[algorithm] = model
/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/sklearn/utils/metaestimators.py in <lambda>(*args, **kwargs)
118
119 # lambda, but not partial, allows help() to work with update_wrapper
--> 120 out = lambda *args, **kwargs: self.fn(obj, *args, **kwargs)
121 # update the docstring of the returned function
122 update_wrapper(out, self.fn)
/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/sklearn/pipeline.py in score(self, X, y, sample_weight)
620 if sample_weight is not None:
621 score_params['sample_weight'] = sample_weight
--> 622 return self.steps[-1][-1].score(Xt, y, **score_params)
623
624 @property
/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/sklearn/base.py in score(self, X, y, sample_weight)
498 """
499 from .metrics import accuracy_score
--> 500 return accuracy_score(y, self.predict(X), sample_weight=sample_weight)
501
502 def _more_tags(self):
/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/sklearn/multiclass.py in predict(self, X)
365 for i, e in enumerate(self.estimators_):
366 pred = _predict_binary(e, X)
--> 367 np.maximum(maxima, pred, out=maxima)
368 argmaxima[maxima == pred] = i
369 return self.classes_[argmaxima]
ValueError: operands could not be broadcast together with shapes (2557,) (8,) (2557,)
打印形状X_test
和y_test
产量:(2557, 12) (2557,)
我能够理解(8,)
来自哪里 - 它的长度groups_count_dict
解决方案
原来解决方案是OneVsRestClassifier
从管道中删除使用
推荐阅读
- python - “MultiColumnListbox”对象没有属性“curselection”
- javascript - 为什么我不能渲染这个对象元素?收到此错误``TypeError:无法读取未定义的属性'temp''
- c# - .NET Core 3.1 MVC Web 应用程序中的 413 错误
- reactjs - Skeleton API 中的 Wave VS 脉冲动画
- linux - 我如何重新连接到在 linux 中会话断开后进入后台的进程
- r - 我可以在 R 中的 ggraph/ggplot2 中的弧图中分隔两组顶点吗?
- microsoft-graph-api - ErrorInternalServerError 获取我的内容
- php - 预期类型“数组”。找到'int'.intelephense(1006)
- python-3.x - python中linux的signal.siginterrupt的可移植等价物是什么?
- c# - 如何添加 IList
在 DataGridView 的现有列中?