python - 无法在逻辑回归中将字符串转换为浮点数
问题描述
我写了以下代码:
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
Spam_model = LogisticRegression(solver='liblinear', penalty='l1')
print(X_train)
Spam_model.fit(X_train, Y_train)
pred = Spam_model.predict(X_test)
accuracy_score(Y_test,pred)
它抛出以下错误。这可能是什么原因?
解决方案
如果您有文本作为数据,则需要在应用分类器之前进行特征提取。使用sklearn 中的一个旧示例:
from sklearn.datasets import fetch_20newsgroups
cats = ['alt.atheism', 'sci.space']
newsgroups_train = fetch_20newsgroups(subset='train', categories=cats)
X_train = newsgroups_train.data
Y_train = newsgroups_train.target
newsgroups_test = fetch_20newsgroups(subset='test', categories=cats)
X_test = newsgroups_test.data
Y_test = newsgroups_test.target
数据如下所示:
Y_train
array([0, 1, 1, ..., 1, 1, 1])
X_train[0][:50]
'From: bil@okcforum.osrhe.edu (Bill Conner)\nSubject'
应用矢量化器将文本转换为基本的数字特征,然后训练模型:
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
vectorizer = TfidfVectorizer()
X_train_vec = vectorizer.fit_transform(X_train)
X_test_vec = vectorizer.transform(X_test)
model = LogisticRegression(solver='liblinear', penalty='l1')
model.fit(X_train_vec, Y_train)
pred = model.predict(X_test_vec)
accuracy_score(Y_test,pred)
0.906030855539972
推荐阅读
- android - 在图像周围添加线条
- firebase - How to get data from other collection in streambuilder
- python - 用python更漂亮地显示从数据库中获取的数据?
- html - Is there a way to make items in a div be spaced by a specific amount?
- sql - Populating a table in PostgreSQL and logic formulating
- codemirror - 如何在 codemirror 中为新语言添加语法突出显示?
- reactjs - React - 从 Redux Store 设置受控输入值
- java - 用户使用扫描仪和数组的多个输入
- asp.net-core - 从另一个获取一个动作的 ActionContext
- scala - 回滚失败发布流程