首页 > 解决方案 > 使用批量归一化计算网络输入的导数:训练与推理时间

问题描述

当该网络具有批量标准化层时,当我尝试计算网络输出相对于其输入的导数时,我注意到了一种不同的行为。

更具体地说,当设置 training = True 时,导数等于 0,我认为不应该是这种情况。在推理时,行为与我预期的一样。请参阅下面的代码:

import tensorflow as tf


x =tf.constant([[1.],[2.],[3.]])


class model_bn(tf.keras.Model):

  def __init__(self):
    super(model_bn, self).__init__()
    # I am setting the momentum to 0 so that batch mean and variance are equal to moving mean and variance
    self.batchnorm0 = tf.keras.layers.BatchNormalization(input_shape=(1,), axis = 1, momentum=0.00, center = False, scale = False)       

  def call(self, inputs):   
    x = inputs
    x = self.batchnorm0(x)
    return 10.*x


model = model_bn()

# Computing the derivative at training time
with tf.GradientTape() as tape:
    tape.watch((x))
    # calling the model with training = True implies that the moving mean and variance are computed
    y = model(x, training = True)    
y_x = tape.gradient(y, x)


print(model.batchnorm0.moving_mean, model.batchnorm0.moving_variance)
print(y, y_x)   # y_x = [[0.], [0.], [0.]]


# Computing the derivative at inference time
with tf.GradientTape() as tape:
    tape.watch((x))
    y = model(x, training = False)    # with training = False we use the moving mean and variance instead of the batch mean and variance but here with momentum = 0.00 they should be the same
    # y = tf.reshape(y[:,0],(y.shape[0],1))
y_x = tape.gradient(y, x)

print(y, y_x)

我希望在这两种情况下,导数都等于:

10./sqrt(var + epsilon)

其中 var 是移动方差或批次方差(我认为这里应该相同),而 epsilon 是默认设置为 0.001 的常数。

我在这里缺少什么?

标签: tensorflow

解决方案


推荐阅读