python - Sci kit 学习混淆矩阵总是看起来几乎一样
问题描述
因此,我对机器学习和 Python 都很陌生,但设法能够对我的数据进行分类并使用以下代码使用各种分类器打印混淆矩阵:
def classify_data(df, feature_cols, file):
nbr_folds = 5
attributes = df.loc[:, feature_cols] # Also known as x
class_label = df['task'] # Class label, also known as y.
file.write("\nFeatures used: ")
for feature in feature_cols:
file.write(feature + ",")
print("Features used", feature_cols)
print("MLP")
file.write("MLP")
mlp = MLPClassifier(solver='lbfgs', alpha=1e-5, hidden_layer_sizes=(5, 2), random_state=1)
class_label_predicted = cross_val_predict(mlp, attributes, class_label, cv=nbr_folds)
conf_mat = confusion_matrix(class_label, class_label_predicted)
print(conf_mat)
accuracy = accuracy_score(class_label, class_label_predicted)
print("\nRows classified: " + str(len(class_label_predicted)))
print("\nAccuracy: {0:.3f}%\n".format(accuracy * 100))
file.write("\nClassifier settings:" + str(mlp) + "\n")
file.write("\nRows classified: " + str(len(class_label_predicted)))
file.write("\nAccuracy: {0:.3f}%\n".format(accuracy * 100))
file.writelines('\t'.join(str(j) for j in i) + '\n' for i in conf_mat)
print("RandomForest")
file.write("\nRandomForest")
#sv = svm.SVC(kernel="linear")
clf = RandomForestClassifier(max_depth=2, random_state=0)
class_label_predicted = cross_val_predict(clf, attributes, class_label, cv=nbr_folds)
conf_mat = confusion_matrix(class_label, class_label_predicted)
print(conf_mat)
accuracy = accuracy_score(class_label, class_label_predicted)
print("Rows classified: " + str(len(class_label_predicted)))
print("Accuracy: {0:.3f}%\n".format(accuracy * 100))
file.write("\nClassifier settings:" + str(clf) + "\n")
file.write("\nRows classified: " + str(len(class_label_predicted)))
file.write("\nAccuracy: {0:.3f}%\n".format(accuracy * 100))
file.writelines('\t'.join(str(j) for j in i) + '\n' for i in conf_mat)
但是,我开始怀疑我是否在这里做错了什么,因为混淆矩阵几乎总是相同的,将所有内容都放在我的第五个功能上。当我在 Weka 应用程序中运行具有相同属性的完全相同的数据集时,我会得到不同的结果。下面是一个例子:
sci kit learn:
MLP
Rows classified: 6881
Accuracy: 25.970%
0 0 0 0 412 12 0 0 25 1 0 0 0
0 0 0 0 540 50 0 0 8 0 0 0 0
0 0 0 0 111 3 0 0 6 2 0 0 0
0 0 0 0 139 19 0 0 4 2 0 0 0
0 0 0 0 1630 54 0 0 106 18 0 0 0
0 0 0 0 554 63 0 0 22 0 0 0 0
0 0 0 0 246 8 0 0 33 10 0 0 0
0 0 0 0 324 39 0 0 8 0 0 0 0
0 0 0 0 605 60 0 0 90 5 0 0 0
0 0 0 0 519 31 0 0 72 4 0 0 0
0 0 0 0 455 19 0 0 10 1 0 0 0
0 0 0 0 260 11 0 0 21 1 0 0 0
0 0 0 0 236 8 0 0 21 3 0 0 0
RandomForest:
Rows classified: 6881
Accuracy: 26.174%
0 0 0 0 440 0 0 0 10 0 0 0 0
0 0 0 0 597 0 0 0 0 1 0 0 0
0 0 0 0 119 0 0 0 3 0 0 0 0
0 0 0 0 164 0 0 0 0 0 0 0 0
0 0 0 0 1774 0 0 0 34 0 0 0 0
0 0 0 0 629 0 0 0 10 0 0 0 0
0 0 0 0 268 0 0 0 29 0 0 0 0
0 0 0 0 371 0 0 0 0 0 0 0 0
0 0 0 0 733 0 0 0 27 0 0 0 0
0 0 0 0 605 0 0 0 21 0 0 0 0
0 0 0 0 484 0 0 0 1 0 0 0 0
0 0 0 0 286 0 0 0 7 0 0 0 0
0 0 0 0 263 0 0 0 5 0 0 0 0
Weka
MLP
a b c d e f g h i j k l m <-- classified as
5 504 50 1 0 0 10 28 0 0 0 0 0 | a = t1
2 1511 56 1 4 1 83 135 0 2 12 0 1 | b = t12
4 467 88 0 1 3 30 45 0 0 0 1 0 | c = t2
1 227 15 2 2 0 36 13 0 1 0 0 0 | d = t3
4 369 18 2 1 0 25 31 0 0 0 0 0 | e = t0
3 306 43 0 1 2 10 6 0 0 0 0 0 | f = t4
5 463 36 2 4 0 178 69 0 0 2 0 1 | g = t5
3 371 23 1 0 0 49 176 0 0 2 1 0 | h = t6
4 398 14 1 1 0 28 33 0 0 5 1 0 | i = t7
1 252 13 0 0 0 16 8 0 1 2 0 0 | j = t8
1 213 9 0 0 0 20 24 0 1 0 0 0 | k = t9
1 96 3 0 0 0 4 16 0 0 2 0 0 | l = t10
1 133 7 0 0 0 7 15 0 0 1 0 0 | m = t11
我还想知道是否可以像 Weka 那样打印带有类标签的混淆矩阵?这里看起来 b 列有点等于 sci kit learn 中的第五列,但很难分辨它代表什么列。
解决方案
It seems like your data set is heavily imbalanced - 5th class is extremely dominant and your models simply learn to predict this label most of the time.
How to deal with this? Read for example this.
推荐阅读
- javascript - 是什么导致了 Firefox 上带有 .mp4 的“混合混合模式”中的错误?
- mysql - mysql cpu使用率太高
- python - 在python中通过另一列搜索一列的每个值的任何更快的方法?
- python - 如何合并两个 df 使得 df1 的所有值都应该在新的 df 中
- python - python google api v3更新文件错误
- c# - 使用 Bcrypt C# 验证密码无法正常工作
- azure - Azure 应用程序网关会话交换错误
- java - Android 11:内容不允许的主目录(无效)://媒体/外部/文件允许的目录是[下载,文档]
- email - 带有 content_type (text/html) 和附件标签的 Oozie 电子邮件操作正在以普通文本/纯格式发送邮件
- ios - 对于 React-Native iOS XCode 版本 12.5 的旧项目,构建失败