首页 > 解决方案 > 尝试将渐变应用于 2 个 keras 模型时,渐变返回 None

问题描述

我想结合一个 cnn 和一个变压器,并将渐变应用于两个模型。

我创建了我的 CNN 模型:

cnn_model = models.Sequential([
layers.Conv1D(filters=32, kernel_size=3,
                  strides=1, padding="causal",
                  activation="relu",
                  input_shape=[None, 1024]),
layers.MaxPooling1D(pool_size=2, strides=1),
layers.Conv1D(filters=64, kernel_size=3,
                  strides=1, padding="causal",
                  activation="relu"),
layers.MaxPooling1D(pool_size=2, strides=1),
layers.Conv1D(filters=64, kernel_size=3,
                  strides=1, padding="causal",
                  activation="relu"),
layers.BatchNormalization(), ]) 
cnn_model.compile(optimizer=optimizer)

transformer = Transformer(num_layers, d_model2, num_heads, dff,
                      input_vocab_size2, target_vocab_size2, 
                      pe_input=input_vocab_size2, 
                      pe_target=target_vocab_size2,
                      rate=dropout_rate)

我的优化器和损失函数是:

def loss_function(real, pred):
    mask = tf.math.logical_not(tf.math.equal(real, 0))
    loss_ = loss_object(real, pred)

    mask = tf.cast(mask, dtype=loss_.dtype)
    loss_ *= mask

    return tf.reduce_sum(loss_)/tf.reduce_sum(mask)

loss_object = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True, reduction='none')
optimizer = tf.keras.optimizers.Adam(learning_rate, beta_1=0.9, beta_2=0.98, epsilon=1e-9)

在我的训练步骤中,我有:

with tf.GradientTape() as tape:
    
    cnn_prediction = cnn_model(inp, training=True)
    
    predictions, _ = transformer(cnn_prediction, tar_inp, 
                             True, 
                             enc_padding_mask, 
                             combined_mask, 
                             dec_padding_mask)
    loss = loss_function(tar_real, predictions)

gradients = tape.gradient(loss, transformer.trainable_variables)
optimizer.apply_gradients(zip(gradients, transformer.trainable_variables))

cnn_gradients = tape.gradient(loss, cnn_model.trainable_variables)
optimizer.apply_gradients(zip(cnn_gradients, cnn_model.trainable_variables))

train_loss(loss)
train_accuracy(accuracy_function(tar_real, predictions))

但是当我为 cnn 模型应用渐变时,我得到的 cnn_gradients 是:

[None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None]

和错误

No gradients provided for any variable: ['conv1d_10/kernel:0', 'conv1d_10/bias:0', 'conv1d_11/kernel:0', 'conv1d_11/bias:0', 'conv1d_12/kernel:0', 'conv1d_12/bias:0', 'conv1d_13/kernel:0', 'conv1d_13/bias:0', 'conv1d_14/kernel:0', 'conv1d_14/bias:0', 'conv1d_15/kernel:0', 'conv1d_15/bias:0', 'conv1d_16/kernel:0', 'conv1d_16/bias:0', 'conv1d_17/kernel:0', 'conv1d_17/bias:0', 'conv1d_18/kernel:0', 'conv1d_18/bias:0', 'conv1d_19/kernel:0', 'conv1d_19/bias:0', 'batch_normalization_3/gamma:0', 'batch_normalization_3/beta:0'].

关于如何完成这项工作的任何想法?我缺少什么来完成这项工作?

提前致谢

标签: tensorflowkerasdeep-learning

解决方案


从文档中tf.GradientTape

默认情况下,GradientTape 持有的资源会在调用 GradientTape.gradient() 方法后立即释放。要在同一计算中计算多个梯度,请创建一个持久梯度磁带。这允许对 gradient() 方法的多次调用,因为当磁带对象被垃圾回收时资源被释放。

您应该使用两个磁带,或者persistent=True在创建磁带对象时设置,或者同时更新所有渐变:

使用persistent=True

with tf.GradientTape(persistent=True) as tape:
    intermediate_result = model1(inp)
    y_pred = model2(intermediate_result)
    loss = loss_func(y_true, y_pred)
model1_gradients = tape.gradient(loss, model1.trainable_variables)
model2_gradients = tape.gradient(loss, model2.trainable_variables)

请注意,设置persistent=True可能会对性能产生一些影响。

使用 2 个磁带

with tf.GradientTape() as tape1, tf.GradientTape() as tape2:
    intermediate_result = model1(inp)
    y_pred = model2(intermediate_result)
    loss = loss_func(y_true, y_pred)
model1_gradients = tape1.gradient(loss, model1.trainable_variables)
model2_gradients = tape2.gradient(loss, model2.trainable_variables)

一次性计算梯度:

with tf.GradientTape() as tape:
    intermediate_result = model1(inp)
    y_pred = model2(intermediate_result)
    loss = loss_func(y_true, y_pred)
all_gradients = tape.gradient(loss, model1.trainable_variables + model2.trainable_variables)
# applying the gradient in one pass as well
optimizer.apply_gradients(zip(all_gradients, model1.trainable_variables + model2.trainable_variables))

我可能会赞成最后一个选项,这对我来说看起来最简单。


推荐阅读