python - 使用逻辑回归、弓、计数向量器建立文本分类的机器学习模型。获取新输入点的值错误
问题描述
尝试使用预构建模型将新的文本输入点分类为正负中性并获取值错误。我有一个新的评论,需要使用计数矢量化器进行矢量化,但即使在矢量化后形状也不会改变。它在训练和测试数据集上运行良好
used logistic regression
bag of words
count vectorization
# Create an object of class CountVectorizer
bow = CountVectorizer()
## X_train.values.shape
# Call the fit_transform method on training data
X_train = bow.fit_transform(X_train_raw.values)
# Call the transform method on the test dataset
X_test = bow.transform(X_test_raw.values)
#perform column standardaiztion
std = StandardScaler(with_mean=False)
X_train = std.fit_transform(X_train)
X_test = std.transform(X_test)
start = time.time()
# creating list of C
C_values = np.linspace(0.1,1,10)
cv_scores = [] # empty list that will hold cv scores
# Try each value of alpha in the below loop
for c in C_values:
# Create an object of the class Logistic Regression with balanced class weights
clf = LogisticRegression(C = c, class_weight = 'balanced')
# perform 5-fold cross validation
# It returns the cv accuracy for each fold in a list
scores = cross_val_score(clf, X_train, y_train, cv=5, scoring='accuracy')
# Store the mean of the accuracies from all the 5 folds
cv_scores.append(scores.mean())
# calculate misclassification error from accuracy (error = 1 - accuracy)
cv_error = [1 - x for x in cv_scores]
# optimal (best) C is the one for which error is minimum (or accuracy is maximum)
optimal_C = C_values[cv_error.index(min(cv_error))]
print('\nThe optimal alpha is', optimal_C)
end = time.time()
print("Total time in minutes = ", (end-start)/60)
clf = LogisticRegression(C = optimal_C)
clf.fit(X_train, y_train)
y_pred = clf.predict(X_test)
acc = accuracy_score(y_test, y_pred) * 100
print("Accuracy =", acc)
confusion_matrix(y_test, y_pred)
def predict_new(X):
# X1 = [[X]]
X2 = [X]
X4= pd.Series(X2)
X3,los = data_cleaning(X4)
#print(X3.values)
X_t = bow.transform(X3)
print(X_t.shape)
#X_t= std.transform(X_t)
#pred = clf.predict(X_t)
pred =0
if pred == 1:
print("Positive")
elif pred == -1:
print("Negative")
else:
print("Neutral")
predict_new('the app is good')
我预计输出为正负或中性
解决方案
推荐阅读
- python-3.x - 如何在没有任何模块的 Python 中求解方程?
- excel - 使用spring boot控制器返回excel文件
- font-face - @font-face 不会显示我的警察
- javascript - 向 JavaScript 链接添加按钮
- .net - 如何在 EF Core 2.1 中进行播种
- javascript - Angular 6 混合应用程序不加载 AngularJS 组件
- react-native - flatlist 将来自 json api 的数据呈现为每行 3 个项目
- java - 是否可以在 android 推送通知中添加视频?
- vb.net - OCR 图像与文本使用 Leadtools + VB.Net 查找图像上特定文本的位置/坐标
- python - Flask:函数内多次渲染模板?