python - 如何正确使用 sklearn 的 cross_validate 和 One Hot Encoded 类?
问题描述
我创建了一个模型来对我的 8 类数据集进行分类,并使用 MLP 从中获得一些分数。为此,我决定使用 sklearn.metrics.cross_validate,使用 10 折。
以下代码工作正常:
from sklearn.neural_network import MLPClassifier
from sklearn.model_selection import cross_validate
from sklearn.metrics import accuracy_score, make_scorer, f1_score
import pandas as pd
def MLPClasify(sample):
df = pd.read_csv('my_path\\my_file.csv', header=None)
y = df[NumberOfFeatures]
x = df.drop([NumberOfFeatures], axis=1)
clf = MLPClassifier(hidden_layer_sizes=(27), activation='logistic', max_iter=500, alpha=0.0001,
solver='adam', verbose=10, random_state=21, tol=0.000000001)
clf.out_activation_ = 'softmax'
scoring = {'Accuracy': make_scorer(accuracy_score), 'F1': make_scorer(f1_score,
average='weighted')}
scores = cross_validate(clf, x, y, cv=10, scoring=scoring)
return scores
一切顺利,我得到了大约 60% 的准确度。所以我决定使用一种热编码,看看能不能得到更好的结果。所以我写了以下代码:
from sklearn.neural_network import MLPClassifier
from sklearn.model_selection import cross_validate
from sklearn.preprocessing import LabelEncoder, OneHotEncoder
from sklearn.metrics import accuracy_score, make_scorer, f1_score
import pandas as pd
def MLPClasify(sample):
df = pd.read_csv('my_path\\my_file.csv', header=None)
y = df[NumberOfFeatures]
x = df.drop([NumberOfFeatures], axis=1)
label_encoder = LabelEncoder()
integer_encoded = label_encoder.fit_transform(y)
onehot_encoder = OneHotEncoder()
integer_encoded = integer_encoded.reshape(len(integer_encoded), 1)
onehot_encoded = onehot_encoder.fit_transform(integer_encoded)
y = onehot_encoded
clf = MLPClassifier(hidden_layer_sizes=(27), activation='logistic', max_iter=500, alpha=0.0001,
solver='adam', verbose=10, random_state=21, tol=0.000000001)
clf.out_activation_ = 'softmax'
scoring = {'Accuracy': make_scorer(accuracy_score), 'F1': make_scorer(f1_score,
average='weighted')}
scores = cross_validate(clf, x, y, cv=10, scoring=scoring)
return scores
好吧,代码运行,但我收到以下警告:
UndefinedMetricWarning:F-score 定义不明确,在没有真实样本和预测样本的标签中设置为 0.0。使用zero_division
参数来控制这种行为。平均,“真实或预测”,“F 分数是”,len(true_sum)
此外,我的准确率下降到不到 2%
关于我可能做错了什么的任何想法?
谢谢您的帮助
解决方案
推荐阅读
- python - 如何填写嵌套字典的键
- linux - Ansible:在目标上执行 shell 脚本时出错
- php - 如何在 laravel 组件中定义变量
- php - Laravel 没有正确显示我的错误
- scheme - 打印十进制数字在鸡计划中四舍五入
- string - CSV 文件 VB.NET 中的文本限定符
- python-3.x - 如何使用Python遍历字符串并删除重复的单词
- python - 在 Python 中生成音调而不冻结线程?
- pine-script - pine escript tradingview 谁能帮帮我?
- flutter - 如何在 dateformat 飞镖中验证年份?