python - SoftmaxMultiClassObj:标签大小和 pred 大小不匹配 - XGBoost
问题描述
我正在尝试使用 XGboost 使用 ~9000 个文档和 5 个标签执行多类文本分类。我也尝试使用 20/80 拆分进行培训和测试,但无法弄清楚如何做到这一点。这是加载数据和库后的代码:
new_sentence = []
for sentence in text_column:
text = re.sub("@\S+|https?:\S|[^A-Za-z0-9]+",'',str(sentence).lower()).strip()
text = [wnl.lemmatize(i) for i in text.split ('') if i not in stop_words]
new_review.append(''.join(text))
return new_review
train['sentence'] = preprocess(train['sentence'])
test['sentence'] = preprocess(test['sentence'])
from sklearn.feature_extraction.text import CountVectorizer
# vectorizing the sentences
cv = CountVectorizer(binary = True) # implies that it indicates whether the word is present or not.
cv.fit(train['sentence']) # find all the unique words from the training set
train_x = cv.transform(train)
test_x = cv.transform(test)
# importing the relevant modules
import xgboost as xgb
xgb_train_labels = []
accepted_strings_half1 = {'location', 'service', 'price'}
accepted_strings_half2 = {'food', 'time'}
for topic in train['topic']:
if topic in accepted_strings_half1:
xgb_train_labels.append(1)
elif topic in accepted_strings_half2:
xgb_train_labels.append(0)
else:
xgb_train_labels.append(None)
xgb_test_labels = []
for topic in test['topic']:
if topic in accepted_strings_half1:
xgb_test_labels.append(1)
elif topic in accepted_strings_half2:
xgb_test_labels.append(0)
else:
xgb_test_labels.append(None)
# creating a variable for the new train and test sets
xgb_train = xgb.DMatrix(train_x, xgb_train_labels)
xgb_test = xgb.DMatrix(test_x, xgb_test_labels)
# Setting the Parameters of the Model
param = {'objective':'multi:softmax', 'num_class': 5 , 'eta': 0.75,
'max_depth': 50,}
# Training the Model
xgb_model = xgb.train(param, xgb_train, num_boost_round = 30)
# Predicting using the Model
y_pred = xgb_model.predict(xgb_test)
y_pred = np.where(np.array(y_pred) > 0.5, 1, 0) # converting them to 1/0’s
# Evaluation of Model
accuracy_score(xgb_test_labels, y_pred)
f1_score(xgb_test_labels, y_pred)
尝试运行上面的最后一个单元格时出现此错误:
XGBoostError Traceback (most recent call last) in () 3 'max_depth': 50,} 4 # 训练模型 ----> 5 xgb_model = xgb.train(param, xgb_train, num_boost_round = 30) 6 # 使用模型进行预测7 y_pred = xgb_model.predict(xgb_test)
3 帧 /usr/local/lib/python3.7/dist-packages/xgboost/core.py 在check_call(ret) 174 """ 175 if ret != 0: --> 176 raise XGBoostError(py_str( LIB.XGBGetLastError ())) 177 178 XGBoostError: [23:04:45] /workspace/src/objective/multiclass_obj.cu:60: 检查失败: preds.Size() == (static_cast<size_t>(param .num_class) * info .labels .Size()):SoftmaxMultiClassObj:标签大小和 pred 大小不匹配堆栈跟踪:
如果有什么我可以做的,请告诉我!谢谢你。
解决方案
推荐阅读
- microsoft-teams - 如何将 Power App 添加到 Teams 会议中,读取会议信息?
- vb.net - 在 .CSV 中导出带有多行行的 datagridview
- sql - SQL 选择确切的行
- c++ - 类没有成员函数
- r - 错误:找不到闪亮的会话对象
- angular - 如何在 switchMap 中等待 Observable?
- react-native - React Native Sortable List JSX 元素类不支持属性
- python - Python Openfermion 包 - 未找到模块错误
- javascript - 在悬停时显示单个按钮?角 2+
- python - pyMC3 - 使用变量的值