python - Keras 模型未能减少损失
问题描述
我提出了一个tf.keras
模型无法从非常简单的数据中学习的示例。我正在使用tensorflow-gpu==2.0.0
和keras==2.3.0
Python 3.7。在文章的最后,我给出了 Python 代码来重现我观察到的问题。
- 数据
样本是形状为 (6, 16, 16, 16, 3) 的 Numpy 数组。为了使事情变得非常简单,我只考虑充满 1 和 0 的数组。带有 1 的数组被赋予标签 1,带有 0 的数组被赋予标签 0。我可以使用以下n_samples = 240
代码生成一些样本(在下面,):
def generate_fake_data():
for j in range(1, 240 + 1):
if j < 120:
yield np.ones((6, 16, 16, 16, 3)), np.array([0., 1.])
else:
yield np.zeros((6, 16, 16, 16, 3)), np.array([1., 0.])
为了在模型中输入这些数据,我使用下面的代码tf.keras
创建了一个实例。tf.data.Dataset
这将基本上创建洗牌批次的BATCH_SIZE = 12
样本。
def make_tfdataset(for_training=True):
dataset = tf.data.Dataset.from_generator(generator=lambda: generate_fake_data(),
output_types=(tf.float32,
tf.float32),
output_shapes=(tf.TensorShape([6, 16, 16, 16, 3]),
tf.TensorShape([2])))
dataset = dataset.repeat()
if for_training:
dataset = dataset.shuffle(buffer_size=1000)
dataset = dataset.batch(BATCH_SIZE)
dataset = dataset.prefetch(tf.data.experimental.AUTOTUNE)
return dataset
- 模型
我建议使用以下模型对我的样本进行分类:
def create_model(in_shape=(6, 16, 16, 16, 3)):
input_layer = Input(shape=in_shape)
reshaped_input = Lambda(lambda x: K.reshape(x, (-1, *in_shape[1:])))(input_layer)
conv3d_layer = Conv3D(filters=64, kernel_size=8, strides=(2, 2, 2), padding='same')(reshaped_input)
relu_layer_1 = ReLU()(conv3d_layer)
pooling_layer = GlobalAveragePooling3D()(relu_layer_1)
reshape_layer_1 = Lambda(lambda x: K.reshape(x, (-1, in_shape[0] * 64)))(pooling_layer)
expand_dims_layer = Lambda(lambda x: K.expand_dims(x, 1))(reshape_layer_1)
conv1d_layer = Conv1D(filters=1, kernel_size=1)(expand_dims_layer)
relu_layer_2 = ReLU()(conv1d_layer)
reshape_layer_2 = Lambda(lambda x: K.squeeze(x, 1))(relu_layer_2)
out = Dense(units=2, activation='softmax')(reshape_layer_2)
return Model(inputs=[input_layer], outputs=[out])
该模型使用 Adam(使用默认参数)和binary_crossentropy
损失进行了优化:
clf_model = create_model()
clf_model.compile(optimizer=Adam(),
loss='categorical_crossentropy',
metrics=['accuracy', 'categorical_crossentropy'])
的输出clf_model.summary()
是:
Model: "model"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_1 (InputLayer) [(None, 6, 16, 16, 16, 3) 0
_________________________________________________________________
lambda (Lambda) (None, 16, 16, 16, 3) 0
_________________________________________________________________
conv3d (Conv3D) (None, 8, 8, 8, 64) 98368
_________________________________________________________________
re_lu (ReLU) (None, 8, 8, 8, 64) 0
_________________________________________________________________
global_average_pooling3d (Gl (None, 64) 0
_________________________________________________________________
lambda_1 (Lambda) (None, 384) 0
_________________________________________________________________
lambda_2 (Lambda) (None, 1, 384) 0
_________________________________________________________________
conv1d (Conv1D) (None, 1, 1) 385
_________________________________________________________________
re_lu_1 (ReLU) (None, 1, 1) 0
_________________________________________________________________
lambda_3 (Lambda) (None, 1) 0
_________________________________________________________________
dense (Dense) (None, 2) 4
=================================================================
Total params: 98,757
Trainable params: 98,757
Non-trainable params: 0
- 训练
该模型训练了 500 个 epoch,如下所示:
train_ds = make_tfdataset(for_training=True)
history = clf_model.fit(train_ds,
epochs=500,
steps_per_epoch=ceil(240 / BATCH_SIZE),
verbose=1)
- 问题!
在 500 个 epoch 中,模型损失保持在 0.69 左右,并且从未低于 0.69。如果我将学习率设置为
1e-2
而不是1e-3
. 数据非常简单(只有 0 和 1)。天真地,我希望该模型具有比仅 0.6 更好的精度。事实上,我希望它能够迅速达到 100% 的准确率。我做错了什么?
- 完整的代码...
import numpy as np
import tensorflow as tf
import tensorflow.keras.backend as K
from math import ceil
from tensorflow.keras.layers import Input, Dense, Lambda, Conv1D, GlobalAveragePooling3D, Conv3D, ReLU
from tensorflow.keras.models import Model
from tensorflow.keras.optimizers import Adam
BATCH_SIZE = 12
def generate_fake_data():
for j in range(1, 240 + 1):
if j < 120:
yield np.ones((6, 16, 16, 16, 3)), np.array([0., 1.])
else:
yield np.zeros((6, 16, 16, 16, 3)), np.array([1., 0.])
def make_tfdataset(for_training=True):
dataset = tf.data.Dataset.from_generator(generator=lambda: generate_fake_data(),
output_types=(tf.float32,
tf.float32),
output_shapes=(tf.TensorShape([6, 16, 16, 16, 3]),
tf.TensorShape([2])))
dataset = dataset.repeat()
if for_training:
dataset = dataset.shuffle(buffer_size=1000)
dataset = dataset.batch(BATCH_SIZE)
dataset = dataset.prefetch(tf.data.experimental.AUTOTUNE)
return dataset
def create_model(in_shape=(6, 16, 16, 16, 3)):
input_layer = Input(shape=in_shape)
reshaped_input = Lambda(lambda x: K.reshape(x, (-1, *in_shape[1:])))(input_layer)
conv3d_layer = Conv3D(filters=64, kernel_size=8, strides=(2, 2, 2), padding='same')(reshaped_input)
relu_layer_1 = ReLU()(conv3d_layer)
pooling_layer = GlobalAveragePooling3D()(relu_layer_1)
reshape_layer_1 = Lambda(lambda x: K.reshape(x, (-1, in_shape[0] * 64)))(pooling_layer)
expand_dims_layer = Lambda(lambda x: K.expand_dims(x, 1))(reshape_layer_1)
conv1d_layer = Conv1D(filters=1, kernel_size=1)(expand_dims_layer)
relu_layer_2 = ReLU()(conv1d_layer)
reshape_layer_2 = Lambda(lambda x: K.squeeze(x, 1))(relu_layer_2)
out = Dense(units=2, activation='softmax')(reshape_layer_2)
return Model(inputs=[input_layer], outputs=[out])
train_ds = make_tfdataset(for_training=True)
clf_model = create_model(in_shape=(6, 16, 16, 16, 3))
clf_model.summary()
clf_model.compile(optimizer=Adam(lr=1e-3),
loss='categorical_crossentropy',
metrics=['accuracy', 'categorical_crossentropy'])
history = clf_model.fit(train_ds,
epochs=500,
steps_per_epoch=ceil(240 / BATCH_SIZE),
verbose=1)
解决方案
您的代码有一个关键问题:维度洗牌。您永远不应该触及的一个维度是批处理维度- 因为根据定义,它包含数据的独立样本。在您的第一次重塑中,您将特征尺寸与批量尺寸混合:
Tensor("input_1:0", shape=(12, 6, 16, 16, 16, 3), dtype=float32)
Tensor("lambda/Reshape:0", shape=(72, 16, 16, 16, 3), dtype=float32)
这就像喂食 72 个独立的形状样本(16,16,16,3)
。其他层也有类似的问题。
解决方案:
- 与其重塑每一步(你应该使用它
Reshape
),不如塑造你现有的 Conv 和池化层,让一切都直接进行。 - 除了输入和输出层,最好给每一层命名简短而简单 - 不会失去清晰度,因为每一行都由层名称明确定义
GlobalAveragePooling
旨在成为最后一层,因为它会折叠特征尺寸- 在您的情况下,如下所示(12,16,16,16,3) --> (12,3)
:之后的转换几乎没有用- 根据上述,我替换
Conv1D
为Conv3D
- 除非您使用可变批量大小,否则请始终使用
batch_shape=
vs.shape=
,因为您可以全面检查图层尺寸(非常有帮助) - 您的真实值
batch_size
是 6,从您的评论回复中推断出来 kernel_size=1
并且(尤其是)filters=1
是一个非常弱的卷积,我相应地替换了它 - 如果你愿意,你可以恢复- 如果您的预期应用程序中只有 2 个类,我建议您
Dense(1, 'sigmoid')
使用binary_crossentropy
损失
最后一点:除了维度改组建议之外,您可以将上述所有内容都扔掉,仍然可以获得完美的训练集性能;这是问题的根源。
def create_model(batch_size, input_shape):
ipt = Input(batch_shape=(batch_size, *input_shape))
x = Conv3D(filters=64, kernel_size=8, strides=(2, 2, 2),
activation='relu', padding='same')(ipt)
x = Conv3D(filters=8, kernel_size=4, strides=(2, 2, 2),
activation='relu', padding='same')(x)
x = GlobalAveragePooling3D()(x)
out = Dense(units=2, activation='softmax')(x)
return Model(inputs=ipt, outputs=out)
BATCH_SIZE = 6
INPUT_SHAPE = (16, 16, 16, 3)
BATCH_SHAPE = (BATCH_SIZE, *INPUT_SHAPE)
def generate_fake_data():
for j in range(1, 240 + 1):
if j < 120:
yield np.ones(INPUT_SHAPE), np.array([0., 1.])
else:
yield np.zeros(INPUT_SHAPE), np.array([1., 0.])
def make_tfdataset(for_training=True):
dataset = tf.data.Dataset.from_generator(generator=lambda: generate_fake_data(),
output_types=(tf.float32,
tf.float32),
output_shapes=(tf.TensorShape(INPUT_SHAPE),
tf.TensorShape([2])))
dataset = dataset.repeat()
if for_training:
dataset = dataset.shuffle(buffer_size=1000)
dataset = dataset.batch(BATCH_SIZE)
dataset = dataset.prefetch(tf.data.experimental.AUTOTUNE)
return dataset
结果:
Epoch 28/500
40/40 [==============================] - 0s 3ms/step - loss: 0.0808 - acc: 1.0000
推荐阅读
- c - 从c中的文件中读取文本并忽略“一些评论”
- api - 了解“后端”
- firebase - Flutter Firebase 多语言
- python - 使用 click python 模块创建多级 cli
- swift - NavigationLink SwiftUI 中的三元运算符
- node.js - 升级后 eslint 说 NodeJS 未定义
- python - 如何在数组numpy中没有单词“list”的情况下将数据框pandas转换为numpy中的列表
- javascript - 我无法在 javascript 中回显 acf textarea
- java - 即使 Android-Room-LiveData-Fragments 中的数据库中存在数据,DAO 中的查询也始终返回 null
- php - Wp All Import - 从标题中排除产品 ID