首页 > 解决方案 > 使用逻辑回归、弓、计数向量器建立文本分类的机器学习模型。获取新输入点的值错误

问题描述

尝试使用预构建模型将新的文本输入点分类为正负中性并获取值错误。我有一个新的评论,需要使用计数矢量化器进行矢量化,但即使在矢量化后形状也不会改变。它在训练和测试数据集上运行良好

used logistic regression
bag of words
count vectorization

# Create an object of class CountVectorizer
bow = CountVectorizer()
## X_train.values.shape
# Call the fit_transform method on training data
X_train = bow.fit_transform(X_train_raw.values)
# Call the transform method on the test dataset
X_test = bow.transform(X_test_raw.values)
#perform column standardaiztion
std = StandardScaler(with_mean=False)
X_train = std.fit_transform(X_train)
X_test = std.transform(X_test)
start = time.time()
# creating list of C
C_values = np.linspace(0.1,1,10)

cv_scores = [] # empty list that will hold cv scores

# Try each value of alpha in the below loop
for c in C_values:
    # Create an object of the class Logistic Regression with balanced class weights
    clf = LogisticRegression(C = c, class_weight = 'balanced')
    # perform 5-fold cross validation
    # It returns the cv accuracy for each fold in a list
    scores = cross_val_score(clf, X_train, y_train, cv=5, scoring='accuracy')
    # Store the mean of the accuracies from all the 5 folds
    cv_scores.append(scores.mean())

# calculate misclassification error from accuracy (error = 1 - accuracy)
cv_error = [1 - x for x in cv_scores]

# optimal (best) C is the one for which error is minimum (or accuracy is maximum)
optimal_C = C_values[cv_error.index(min(cv_error))]
print('\nThe optimal alpha is', optimal_C)

end = time.time()
print("Total time in minutes = ", (end-start)/60)
clf = LogisticRegression(C = optimal_C)
clf.fit(X_train, y_train)
y_pred = clf.predict(X_test)
acc = accuracy_score(y_test, y_pred) * 100
print("Accuracy =", acc)
confusion_matrix(y_test, y_pred)

def predict_new(X):
  #  X1 = [[X]]
    X2 = [X]
    X4= pd.Series(X2)
    X3,los = data_cleaning(X4)
    #print(X3.values)
    X_t = bow.transform(X3)
    print(X_t.shape)
    #X_t= std.transform(X_t)
    #pred = clf.predict(X_t)
    pred =0
    if pred ==  1:
        print("Positive")
    elif pred == -1:
        print("Negative")
    else:
        print("Neutral")
predict_new('the app is good')

我预计输出为正负或中性

标签: pythonmachine-learning

解决方案


推荐阅读