首页 > 解决方案 > 值错误:维度不匹配 - Python 分类

问题描述

我正在对文本数据运行分类模型。我使用 countvectorizer 为模型创建特征。训练后,我尝试预测新实例;但是,我不断收到尺寸不匹配错误。我知道这是因为新实例不具备训练数据所具有的所有特征。我仍然不确定如何解决这个问题。下面是我的代码:

from sklearn.naive_bayes import MultinomialNB

x = data['text']
y = data['class']

# Transform data
cv_transformer = CountVectorizer()
Encoder = LabelEncoder()

x = cv_transformer.fit_transform(x)
y = Encoder.fit_transform(y)

# Split data
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.4, 
                                                    random_state=1) 


Naive = MultinomialNB()
Naive.fit(x_train,y_train)
# predict the labels on validation dataset
predictions_NB = Naive.predict(x_test)
# Use accuracy_score function to get the accuracy
print("Naive Bayes Accuracy Score -> ",accuracy_score(predictions_NB, y_test)*100)

# Testing a new instance

sample = ['my name is john doe']
sample = cv_transformer.transform(sample)

Naive.predict(sample)

最后一行导致错误弹出。关于如何调整尺寸的任何想法?

错误消息如下:

~\Anaconda3\lib\site-packages\scipy\sparse\base.py in mul (self, other)

if other.shape[0] != self.shape[1]:
raise ValueError('dimension mismatch')

result = self._mul_multivector(np.asarray(other))

ValueError:尺寸不匹配

标签: pythonmachine-learningnlpclassification

解决方案


推荐阅读