首页 > 解决方案 > 混淆矩阵只需要 0 和 1 类

问题描述

我构建了以下 LSTM 网络,它运行良好,虽然它的准确率只有 60%。我认为这是由于问题造成的,它只是对标签 0 和 1 进行分类,而不是对 2 和 3 进行分类,因为混淆矩阵对于 2 类和 3 类具有零。

import keras 
import numpy as np
from keras.preprocessing.text import Tokenizer
import numpy as np
import pandas as pd
from keras.models import Sequential
from keras.layers import Dense
from keras.preprocessing.sequence import pad_sequences
from keras.layers import Input, Dense, Dropout, Embedding, LSTM, Flatten
from keras.models import Model
from keras.utils import to_categorical
from keras.callbacks import ModelCheckpoint
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
plt.style.use('ggplot')
%matplotlib inline
from sklearn.metrics import precision_score
from sklearn.metrics import recall_score
from sklearn.metrics import f1_score
from sklearn.metrics import cohen_kappa_score
from sklearn.metrics import roc_auc_score
from sklearn.metrics import confusion_matrix

data = pd.read_csv("dataset/train_set.csv", sep="\t")


data['num_words'] = data.Text.apply(lambda x : len(x.split()))


num_class = len(np.unique(data.Label.values)) # 4
y = data['Label'].values


MAX_LEN = 300
tokenizer = Tokenizer()
tokenizer.fit_on_texts(data.Text.values)


post_seq = tokenizer.texts_to_sequences(data.Text.values)
post_seq_padded = pad_sequences(post_seq, maxlen=MAX_LEN)


X_train, X_test, y_train, y_test = train_test_split(post_seq_padded, y, test_size=0.25)


vocab_size = len(tokenizer.word_index) +1 


inputs = Input(shape=(MAX_LEN, ))
embedding_layer = Embedding(vocab_size,
                            128,
                            input_length=MAX_LEN)(inputs)

x = LSTM(64)(embedding_layer)
x = Dense(32, activation='relu')(x)
predictions = Dense(num_class, activation='softmax')(x)
model = Model(inputs=[inputs], outputs=predictions)
model.compile(optimizer='adam',
              loss='categorical_crossentropy',
              metrics=['acc'])

model.summary()

filepath="weights.hdf5"
checkpointer = ModelCheckpoint(filepath, monitor='val_acc', verbose=1, save_best_only=True, mode='max')
history = model.fit([X_train], batch_size=64, y=to_categorical(y_train), verbose=1, validation_split=0.25, 
          shuffle=True, epochs=10, callbacks=[checkpointer])

df = pd.DataFrame({'epochs':history.epoch, 'accuracy': history.history['acc'], 'validation_accuracy': history.history['val_acc']})
g = sns.pointplot(x="epochs", y="accuracy", data=df, fit_reg=False)
g = sns.pointplot(x="epochs", y="validation_accuracy", data=df, fit_reg=False, color='green')

model.load_weights('weights.hdf5')
predicted = model.predict(X_test)

predicted = np.argmax(predicted, axis=1)

accuracy_score(y_test, predicted)

print(accuracy_score)

y_pred1 = model.predict(X_test, verbose=0)
yhat_classes = np.argmax(y_pred1,axis=1)
# predict probabilities for test set
yhat_probs = model.predict(X_test, verbose=0)
# reduce to 1d array
yhat_probs = yhat_probs[:, 0]
yhat_classes = yhat_classes[:, ]

# accuracy: (tp + tn) / (p + n)
accuracy = accuracy_score(y_test, yhat_classes)
print('Accuracy: %f' % accuracy)
# precision tp / (tp + fp)
precision = precision_score(y_test, yhat_classes, average='micro')
print('Precision: %f' % precision)
# recall: tp / (tp + fn)
recall = recall_score(y_test, yhat_classes, average='micro')
print('Recall: %f' % recall)
# f1: 2 tp / (2 tp + fp + fn)
f1 = f1_score(y_test, yhat_classes, average='micro')
print('F1 score: %f' % f1)
matrix = confusion_matrix(y_test, yhat_classes) 
print(matrix)

混淆矩阵:

[[324 146   0   0]
 [109 221   0   0]
 [ 55  34   0   0]
 [ 50  16   0   0]]

平均值设置为“微”,输出层有四个节点用于四个类。仅来自 train_set 的准确度、f1-score 和召回率是这样的(有时会预测第 2 类,但不会预测第 3 类):

Accuracy: 0.888539
Precision: 0.888539
Recall: 0.888539

有谁知道为什么会这样?

标签: pythonclassificationconfusion-matrix

解决方案


可能是模型陷入了次优解决方案。在您的问题中,类 0 和 1 代表总实例的 85%,因此非常不平衡。该模型预测 0 类和 1 类,因为它没有完全收敛,这是此类模型中的经典错误模式。以一种非正式的方式,您可以将其视为模型是懒惰的……我建议您:

  • 训练时间更长
  • 尝试查看您的模型是否可以过度拟合您的训练数据。为此,我会训练更长时间并测量训练误差。您会看到,如果您的模型或数据中没有重大问题,该模型最终将至少在您的训练集中预测类 2 和 3。从那时起,您可以丢弃数据/模型中的问题
  • 使用批量标准化,在实践中我已经看到它有助于摆脱这种错误模式
  • 总是使用一点 dropout,它有助于规范化模型。

推荐阅读