首页 > 解决方案 > 情绪分析逻辑回归的错误输入形状

问题描述

我想预测带有逻辑回归的情绪分析模型的准确性,但得到错误:输入形状错误(使用输入进行编辑)

数据框:

df
sentence                | polarity_label
new release!            | positive
buy                     | neutral
least good-looking      | negative

代码:

from sklearn.preprocessing import OneHotEncoder                                                   
from sklearn.feature_extraction.text import CountVectorizer, TfidfVectorizer, 
ENGLISH_STOP_WORDS
# Define the set of stop words
my_stop_words = ENGLISH_STOP_WORDS
vect = CountVectorizer(max_features=5000,stop_words=my_stop_words)
vect.fit(df.sentence)
X = vect.transform(df.sentence)
y = df.polarity_label
encoder = OneHotEncoder()
encoder.fit_transform(y)

from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X,y, test_size=0.2, random_state=123)
LogisticRegression(penalty='l2',C=1.0)

log_reg = LogisticRegression().fit(X_train, y_train)

错误信息

ValueError: Expected 2D array, got 1D array instead:
array=['Neutral' 'Positive' 'Positive' ... 'Neutral' 'Neutral' 'Neutral'].
Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.```

How can I fix this?

标签: pythonpython-3.xscikit-learnlogistic-regression

解决方案


我认为您需要将您的 y 标签转换为 One hot encoding,现在您的标签向量可能是这样的 [0,1,0,0,1,0],但是对于逻辑回归,您需要将它们转换为这种形式 [ [0,1],[1,0],[0,1],[0,1]],因为在逻辑回归中,我们倾向于计算所有类的概率/似然性。

您可以使用 sklearn onehotencoder来做到这一点,

from sklearn.preprocessing import OneHotEncoder                                                   
encoder = OneHotEncoder()
encoder.fit_transform(y)

推荐阅读