python - 变换预测目标
问题描述
我有一个数据集,其中每个观察结果可能属于不同的标签(多标签分类)。
我已经对其及其工作进行了 SVM 分类。(在这里,我对查看每个类的准确性很感兴趣,所以我OneVsRestClassifier
按类应用,正如您将在代码中看到的那样。)
我想查看测试数据中每个项目的预测值。换句话说,我想看看模型在测试样本中的每次观察预测了哪个标签。
例如:这是传递给模型进行预测的数据
,sentences,ADR,WD,EF,INF,SSI,DI,others
0,"extreme weight gain, short-term memory loss, hair loss.",1,0,0,0,0,0,0
1,I am detoxing from Lexapro now.,0,0,0,0,0,0,1
2,I slowly cut my dosage over several months and took vitamin supplements to help.,0,0,0,0,0,0,1
3,I am now 10 days completely off and OMG is it rough.,0,0,0,0,0,0,1
4,"I have flu-like symptoms, dizziness, major mood swings, lots of anxiety, tiredness.",0,1,0,0,0,0,1
5,I have no idea when this will end.,1,0,0,0,0,0,1
那么我的模型已经预测了这些行的标签,我想查看每一行的预测映射。
我知道我们可以Label Binarization
在 scikit-learn 库中做到这一点。
问题是这里fit_transform
解释的输入参数 与我准备并传递给 SVM 分类的目标数据不同。所以我不知道如何弄清楚。
这是我的代码:
df = pd.read_csv("finalupdatedothers.csv")
categories = ['ADR','WD','EF','INF','SSI','DI','others']
train,test = train_test_split(df,random_state=42,test_size=0.3,shuffle=True)
X_train = train.sentences
X_test = test.sentences
SVC_pipeline = Pipeline([
('tfidf', TfidfVectorizer(stop_words=stop_words)),
('clf', OneVsRestClassifier(LinearSVC(), n_jobs=1)),
])
for category in categories:
print('... Processing {} '.format(category))
SVC_pipeline.fit(X_train,train[category]
prediction = SVC_pipeline.predict(X_test)
print('SVM Linear Test accuracy is {} '.format(accuracy_score(test[category], prediction)))
print 'SVM Linear f1 measurement is {} '.format(f1_score(test[category], prediction, average='weighted'))
print "\n"
我很感激你的时间。
解决方案
这就是你想要的,我刚刚做的是,我映射了prediction
一个 numpy 数组,表示你categories
列表中的类标签索引。所以这里是完整的代码。
import pandas as pd
import numpy as np
from sklearn import svm
from sklearn.datasets import samples_generator
from sklearn.feature_selection import SelectKBest
from sklearn.feature_selection import f_regression
from sklearn.pipeline import Pipeline
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.multiclass import OneVsRestClassifier
from sklearn.svm import LinearSVC
from sklearn.metrics import accuracy_score
from sklearn.metrics import f1_score
df = pd.read_csv("finalupdatedothers.csv")
categories = ['ADR','WD','EF','INF','SSI','DI','others']
train,test = train_test_split(df,random_state=42,test_size=0.3,shuffle=True)
X_train = train.sentences
X_test = test.sentences
SVC_pipeline = Pipeline([
('tfidf', TfidfVectorizer(stop_words=[])),
('clf', OneVsRestClassifier(LinearSVC(), n_jobs=1)),
])
for category in categories:
print('... Processing {} '.format(category))
SVC_pipeline.fit(X_train,train[category])
prediction = SVC_pipeline.predict(X_test)
print([{X_test.iloc[i]:categories[prediction[i]]} for i in range(len(list(prediction))) ])
print('SVM Linear Test accuracy is {} '.format(accuracy_score(test[category], prediction)))
print ('SVM Linear f1 measurement is {} '.format(f1_score(test[category], prediction, average='weighted')))
print ("\n")
这是示例输出:
... Processing ADR
[{'extreme weight gain, short-term memory loss, hair loss.': 'ADR'}, {'I am detoxing from Lexapro now.': 'ADR'}]
SVM Linear Test accuracy is 0.5
SVM Linear f1 measurement is 0.3333333333333333
... Processing WD
[{'extreme weight gain, short-term memory loss, hair loss.': 'ADR'}, {'I am detoxing from Lexapro now.': 'ADR'}]
SVM Linear Test accuracy is 1.0
SVM Linear f1 measurement is 1.0
我希望这有帮助。
推荐阅读
- php - 插入错误数据时验证如何给出错误。在 symfony 中处理 isValid()?
- java - 扫描仪:摆脱“资源泄漏:”
' 永远不会关闭" - vue.js - VueJS + Karma + Webpack 4:没有运行测试
- jasmine - 如何在运行量角器脚本时突出显示它在报告中阻止的特定(重要)?
- javascript - Vue bootstrap b-form-select 在加载时将 vuelidate 的 $anyDirty 设置为 true
- php - 将数据存储到 Laravel 中的另一个表后,从不同模型更新字段
- aws-java-sdk - 如何通过 AWSPricingClientBuilder 最新版本的 sdk 初始化“AWSPricing 客户端”?
- django - 如何在 django 中运行 Select 和 where 语句
- javascript - HTML JavaScript document.getElementById
- android - 在没有 FCM 的打瞌睡时通过推送通知唤醒 android 应用