首页 > 解决方案 > 在急切执行中应用梯度后的Tensorflow 2二阶导数

问题描述

我正在尝试在 Tensorflow 2 中实现模型不可知元学习(MAML)。对于该算法,二阶导数的计算形式为:

for multiple sets, each with an individual model:
  1. Determine the loss for a training set
  2. Determine the gradients for this loss w.r.t. the model
  3. Apply the gradients (theta' <- theta-alpha*gradients)

4. Determine the loss for an evaluation set for each of the updated models
5. Determine the gradient of the sum of the losses (4) w.r.t. the global model

我试图在 tensorflow 2 中实现这一点,GradientTape但第二个梯度始终是一个Nones 数组。

我的实现如下所示:

with tf.GradientTape() as meta_update_tape:
    # inner loop (for all tasks)
    for bi in range(batch_size):

       # the training and evaluation batches
       x_i, y_i, x_i_prime, y_i_prime = dataset.batch_with_eval()

       # reset the weights to to current global weights before training for this batch
       model_copy = copy_model_weights(source=model, target=model_copy)

       # Compute loss using theta_global for D_i
       with tf.GradientTape() as inner_update_tape:
           inner_loss, _, _ = compute_loss(model_copy, x_i, y_i)

       gradients_inner_update = inner_update_tape.gradient(inner_loss, model_copy.trainable_variables)

       # update model parameters (apply theta_i_prime)
       # inner_optimizer.apply_gradients(zip(gradients_inner_update, model_copy.trainable_variables))
       conv_layers  = [i for i,var in enumerate(model_copy.trainable_variables) if var.name.startswith('conv2d')]
       for layer in conv_layers:
           model_copy.trainable_variables[layer].assign(tf.subtract(model.trainable_variables[layer], tf.multiply(0.4, gradients_inner_update[layer])))

       # calculate loss with theta_i_prime with eval set
       loss_eval, accuracy_eval, _ = compute_loss(model_copy, x_i_prime, y_i_prime)

       batch_losses.append(loss_eval)

   sum_losses = tf.reduce_sum(batch_losses) / tf.cast(batch_size, dtype=tf.float32)

# calculate gradient over all losses w.r.t global theta (trainable variables for global model)
gradients_meta_update = meta_update_tape.gradient(sum_losses, model.trainable_variables)

# gradients_meta_update is [None, None, None, ...]

我试过了:

我的想法已经用完了,文档对我没有帮助,所以我感谢每一个建议。谢谢!

标签: machine-learningtensorflow2.0gradient-descent

解决方案


不幸的是,当您尝试更新访问其 trainable_variables 属性的 Keras 模型的权重时,TensorFlow 2 上存在一个错误它也发生在属性trainable_weights上)。通过这种方法更新后,模型的权重似乎失去了梯度跟踪。

model_copy.trainable_variables您应该直接在模型的层上迭代并更新它们的变量以调用它们的属性,而不是访问以更新模型的权重。例如,假设您的模型只有 Conv2D 层,您可以使用以下代码来更新它们的权重:

grad_idx = 0
for idx, layer in enumerate(model_copy.layers):
    if hasattr(model_copy.layers[idx], 'kernel'):
        updated_model.layers[layer_idx].kernel = tf.subtract(model.trainable_variables[layer], tf.multiply(0.4, gradients_inner_update[grad_idx])))
        grad_idx += 1

请注意,这kernel是 Conv2D 图层的一个属性,在其trainable_weights属性中列出。因此,您还应该更新名称也列在图层上的所有其他属性trainable_weights(例如,在密集图层的情况下,您还必须更新属性weightsbias)。该策略适用于其他 Keras 层,例如DenseBatchNormalization


推荐阅读