首页 > 解决方案 > CNN图像分类训练acc达到95%,而验证acc只有45%左右

问题描述

我已经用 Tensorflow 和 Keras 学习了一些深度学习,所以我想做一些实际的实验。

我想用 CAISAV5 指纹数据集(总共 20,000 个指纹图像)训练一个模型,但是在训练过程中,训练精度在 120 个 epoch 后达到 97%,而验证精度保持在 45%。结果如下:

Epoch 109/200
150/150 [==============================] - 23s 156ms/step - loss: 0.6971 - accuracy: 0.9418 - val_loss: 4.1766 - val_accuracy: 0.4171
Epoch 110/200
150/150 [==============================] - 23s 155ms/step - loss: 0.6719 - accuracy: 0.9492 - val_loss: 4.1447 - val_accuracy: 0.4379
Epoch 111/200
150/150 [==============================] - 24s 162ms/step - loss: 0.7003 - accuracy: 0.9388 - val_loss: 4.1439 - val_accuracy: 0.4396
Epoch 112/200
150/150 [==============================] - 24s 157ms/step - loss: 0.7010 - accuracy: 0.9377 - val_loss: 4.1577 - val_accuracy: 0.4425
Epoch 113/200
150/150 [==============================] - 24s 160ms/step - loss: 0.6699 - accuracy: 0.9494 - val_loss: 4.1242 - val_accuracy: 0.4371
Epoch 114/200
150/150 [==============================] - 25s 167ms/step - loss: 0.6814 - accuracy: 0.9456 - val_loss: 4.1966 - val_accuracy: 0.4288
Epoch 115/200
150/150 [==============================] - 24s 160ms/step - loss: 0.6440 - accuracy: 0.9590 - val_loss: 4.1586 - val_accuracy: 0.4354
Epoch 116/200
150/150 [==============================] - 23s 157ms/step - loss: 0.7877 - accuracy: 0.9212 - val_loss: 4.0408 - val_accuracy: 0.4246
Epoch 117/200
150/150 [==============================] - 23s 156ms/step - loss: 0.6728 - accuracy: 0.9504 - val_loss: 3.9317 - val_accuracy: 0.4567
Epoch 118/200
150/150 [==============================] - 25s 167ms/step - loss: 0.5710 - accuracy: 0.9874 - val_loss: 3.9505 - val_accuracy: 0.4483
Epoch 119/200
150/150 [==============================] - 24s 158ms/step - loss: 0.5616 - accuracy: 0.9873 - val_loss: 4.0607 - val_accuracy: 0.4542
Epoch 120/200
150/150 [==============================] - 23s 156ms/step - loss: 0.5948 - accuracy: 0.9716 - val_loss: 4.1531 - val_accuracy: 0.4238
Epoch 121/200
150/150 [==============================] - 23s 155ms/step - loss: 0.7453 - accuracy: 0.9150 - val_loss: 4.0798 - val_accuracy: 0.4154
Epoch 122/200
150/150 [==============================] - 26s 172ms/step - loss: 0.7232 - accuracy: 0.9256 - val_loss: 3.9307 - val_accuracy: 0.4425
Epoch 123/200
150/150 [==============================] - 24s 158ms/step - loss: 0.6277 - accuracy: 0.9632 - val_loss: 3.9988 - val_accuracy: 0.4408
Epoch 124/200
150/150 [==============================] - 23s 156ms/step - loss: 0.6367 - accuracy: 0.9581 - val_loss: 4.0837 - val_accuracy: 0.4358

我通过互联网搜索,发现过度拟合可以解释这一点,所以我尝试简化层,添加 dropout 和 reguziers 并使用批标准化。但是这些方法对准确性的贡献很小。我还对数据进行了标准化,已经洗牌并将其浮点值转换为 0.0 和 1.0 之间。图像的原始分辨率为 328 * 356,在输入自动编码器之前调整为 400 * 400。

这是我的代码的一部分:

def encoder(input_img):
    #encoder
    
    conv1 = Conv2D(32, (3, 3), activation='relu', padding='same')(input_img) 
    conv1 = BatchNormalization()(conv1)
    conv1 = Conv2D(32, (3, 3), activation='relu', padding='same')(conv1)
    conv1 = BatchNormalization()(conv1)
    pool1 = MaxPooling2D(pool_size=(2, 2))(conv1) 
    conv2 = Conv2D(64, (3, 3), activation='relu', padding='same')(pool1) 
    conv2 = BatchNormalization()(conv2)
    conv2 = Conv2D(64, (3, 3), activation='relu', padding='same')(conv2)
    conv2 = BatchNormalization()(conv2)
    pool2 = MaxPooling2D(pool_size=(2, 2))(conv2) 
    conv3 = Conv2D(128, (3, 3), activation='relu', padding='same')(pool2)
    conv3 = BatchNormalization()(conv3)
    conv3 = Conv2D(128, (3, 3), activation='relu', padding='same')(conv3)
    conv3 = BatchNormalization()(conv3)
    return conv3

def fc(enco):
    pool = keras.layers.MaxPooling2D(pool_size = (2, 2))(enco)
    keras.layers.BatchNormalization()
    den1 = keras.layers.Dense(128, activation='relu', kernel_regularizer=regularizers.l2(1e-3))(pool)
    keras.layers.BatchNormalization()
    pool1 = keras.layers.MaxPooling2D(pool_size = (2, 2))(den1)
    keras.layers.Dropout(0.4)
    den2 = keras.layers.Dense(256, activation = 'relu', kernel_regularizer=regularizers.l2(1e-3))(pool1)
    keras.layers.BatchNormalization()
    pool2 = keras.layers.MaxPooling2D(pool_size = (2, 2))(den2)
    keras.layers.Dropout(0.4)
    den3 = keras.layers.Dense(512, activation = 'relu', kernel_regularizer=regularizers.l2(1e-4))(pool2)
    keras.layers.BatchNormalization()
    pool3 = keras.layers.AveragePooling2D(pool_size = (2, 2))(den3)
    keras.layers.Dropout(0.4)
    flat = keras.layers.Flatten()(pool3)
    keras.layers.Dropout(0.4)
    keras.layers.BatchNormalization()
    den4 = keras.layers.Dense(256, activation = 'relu', kernel_regularizer=regularizers.l2(1e-3))(flat)
    keras.layers.Dropout(0.4)
    keras.layers.BatchNormalization()

    out = keras.layers.Dense(num, activation='softmax',kernel_regularizer=regularizers.l2(1e-4))(den4)
    return out


encode = encoder(input_img)
full_model = Model(input_img,fc(encode))


for l1,l2 in zip(full_model.layers[0:15],autoencoder_model.layers[0:15]):
    l1.set_weights(l2.get_weights())


for layer in full_model.layers[0:15]:
    layer.trainable = False
full_model.summary()


full_model.compile(loss=keras.losses.categorical_crossentropy, optimizer=keras.optimizers.Nadam(),metrics=['accuracy'])

批量大小 = 64

autoencoder_model 已经经过训练并且表现良好,损失低于 3e-4。

所以我想知道是什么导致了验证准确性低,我能做些什么来帮助它?

标签: pythontensorflowkerasconv-neural-network

解决方案


最明显的结论是过度拟合,但考虑到您尝试了标准方法来纠正这个问题,如模型简化、辍学和正则化而没有任何改进,这可能是一个不同的问题。为了使验证准确度高,验证数据的概率分布必须反映模型训练所依据的数据的概率分布。那么问题是如何选择验证数据?作为测试,我会尝试的一件事是使验证数据成为训练数据的相同子集。在这种情况下,验证准确度应接近 100%。如果它没有变高,那么它可能指向您处理验证数据的方式。我还注意到您选择不在模型中训练某些层。尝试使所有层都可训练,看看是否有帮助。我已经看到在模型中冻结权重会导致验证准确性降低的地方。不知道为什么,但我相信如果不可训练的层包括辍学,那么权重冻结辍学没有影响,从而导致过度拟合。我不喜欢提前停止。这是无法有效解决过度拟合问题的拐杖。


推荐阅读