首页 > 解决方案 > CNN自动编码器的输出图像是白色的

问题描述

我在训练自动编码 CNN 时遇到问题。我的目标是以无监督的方式对文档图像(收据、信件等)进行聚类(顺便说一句,除了自动编码器之外,您还有其他算法吗?)。

所以我试着做一个自动编码器,我总是得到奇怪的解码输出,我不知道是什么问题。我从一个没有太多压缩的非常简单的模型开始:

    Layer (type)                 Output Shape              Param #   
=================================================================
conv2d_62 (Conv2D)           (None, 100, 76, 16)       448       
_________________________________________________________________
activation_62 (Activation)   (None, 100, 76, 16)       0         
_________________________________________________________________
conv2d_63 (Conv2D)           (None, 50, 38, 32)        4640      
_________________________________________________________________
activation_63 (Activation)   (None, 50, 38, 32)        0         
_________________________________________________________________
conv2d_64 (Conv2D)           (None, 50, 38, 32)        9248      
_________________________________________________________________
activation_64 (Activation)   (None, 50, 38, 32)        0         
_________________________________________________________________
up_sampling2d_26 (UpSampling (None, 100, 76, 32)       0         
_________________________________________________________________
conv2d_65 (Conv2D)           (None, 100, 76, 16)       4624      
_________________________________________________________________
activation_65 (Activation)   (None, 100, 76, 16)       0         
_________________________________________________________________
up_sampling2d_27 (UpSampling (None, 200, 152, 16)      0         
_________________________________________________________________
conv2d_66 (Conv2D)           (None, 200, 152, 3)       435       
_________________________________________________________________
activation_66 (Activation)   (None, 200, 152, 3)       0         
=================================================================
Total params: 19,395
Trainable params: 19,395
Non-trainable params: 0

我用少量的输入(~200)进行了训练,这样训练速度很快,我可以更快地调试。

似乎该模型在 20 个 epoch 和 32 批量大小后收敛:

Epoch 1/20
4/4 [==============================] - 5s 1s/step - loss: 0.4359
Epoch 2/20
4/4 [==============================] - 5s 1s/step - loss: 0.4290
Epoch 3/20
4/4 [==============================] - 4s 904ms/step - loss: 0.4192
Epoch 4/20
4/4 [==============================] - 5s 1s/step - loss: 0.4045
Epoch 5/20
4/4 [==============================] - 3s 783ms/step - loss: 0.3886
Epoch 6/20
4/4 [==============================] - 3s 797ms/step - loss: 0.3706
Epoch 7/20
4/4 [==============================] - 5s 1s/step - loss: 0.3393
Epoch 8/20
4/4 [==============================] - 3s 777ms/step - loss: 0.3165
Epoch 9/20
4/4 [==============================] - 3s 850ms/step - loss: 0.2786
Epoch 10/20
4/4 [==============================] - 3s 780ms/step - loss: 0.2436
Epoch 11/20
4/4 [==============================] - 3s 817ms/step - loss: 0.2036
Epoch 12/20
4/4 [==============================] - 3s 771ms/step - loss: 0.1745
Epoch 13/20
4/4 [==============================] - 5s 1s/step - loss: 0.1347
Epoch 14/20
4/4 [==============================] - 3s 820ms/step - loss: 0.1150
Epoch 15/20
4/4 [==============================] - 5s 1s/step - loss: 0.1017
Epoch 16/20
4/4 [==============================] - 3s 792ms/step - loss: 0.0886
Epoch 17/20
4/4 [==============================] - 3s 789ms/step - loss: 0.0868
Epoch 18/20
4/4 [==============================] - 3s 842ms/step - loss: 0.0844
Epoch 19/20
4/4 [==============================] - 3s 762ms/step - loss: 0.0797
Epoch 20/20
4/4 [==============================] - 3s 779ms/step - loss: 0.0768

但是输出图像看起来像这样:

自编码器的输出(示例)

对于损失,我使用了平均绝对误差和 SGD 优化器(其他的收敛得不是很好)。

我试图推动 epoch 的数量,但损失停滞在 0.07 左右并且没有下降。

我究竟做错了什么?有什么改进它的想法吗?提前致谢。

编辑:这是代码

from tensorflow.keras.preprocessing.image import ImageDataGenerator

datagen = ImageDataGenerator(rescale=1./255, zca_whitening=False, rotation_range=0.2, width_shift_range=0.005, height_shift_range=0.005, zoom_range=0.005)
train_generator = datagen.flow_from_directory('fp_img',class_mode='input',target_size=image_dims, batch_size=batch_size,shuffle=True)

import tensorflow.keras
from tensorflow.keras.datasets import mnist
from tensorflow.keras.models import Sequential, Model
from tensorflow.keras.layers import Dense, Activation, Flatten, Input
from tensorflow.keras.layers import Conv2D, MaxPooling2D, UpSampling2D, Reshape
import matplotlib.pyplot as plt
import numpy as np
from tensorflow.keras.preprocessing.image import ImageDataGenerator

input_shape = image_rgb_dims

# Define the model
model = Sequential()

model.add(Conv2D(16, (3, 3), strides=2, padding='same', input_shape=image_rgb_dims))
model.add(Activation('relu'))

model.add(Conv2D(32, (3, 3), strides=2, padding='same'))
model.add(Activation('relu'))

model.add(Conv2D(32,(3, 3), padding='same'))
model.add(Activation('relu'))
model.add(UpSampling2D((2, 2)))

model.add(Conv2D(16,(3, 3), padding='same'))
model.add(Activation('relu'))
model.add(UpSampling2D((2, 2)))

model.add(Conv2D(3,(3, 3), padding='same'))
model.add(Activation('sigmoid'))

model.summary()

# Compile the model
model.compile(optimizer='adagrad', loss='mean_absolute_error')

# Train the model
model.fit(
        train_generator,
        steps_per_epoch= n_images // batch_size,
        epochs=20)

标签: tensorflowmachine-learningkerasdata-scienceautoencoder

解决方案


  • 我知道我必须在中间添加一个密集层,但我评论了那部分,因为如果它不适用于 2 个 conv,那么它就不可能使用更多的 conv + dense
  • 我尝试使用 1 张单张图像并意识到它过度拟合。使用 1 个 epoch,输出很好,但随着每个新的 epoch,它变得越来越白,而损失仍在减少。最后,它是纯白色的。
  • 我知道,我目前正在尝试使用 1k 图像。我选择了 200 以便我可以快速调试,不想每次更新等待 10 分钟。
  • 是的,我几乎所有的图片都有文字。实际上,OCR 技术也正在由一位同事研究。我的工作是尝试根据文档的“布局”进行聚类。
  • 好的,我会尝试将它们调整为更小,但问题是它们变得非常丑陋,有时我会丢失信息(左上角的徽标可能很棒,因为聚类功能变得难以辨认)。

推荐阅读