python - Does batch normalization in tensorflow use running averages during training?
问题描述
I am using a tensorflow neural net to figure out how batch normalization works and replicate it in my own library. I've run into this strange issue:
When you initialize a neural net layer, all biases (or in case of batchnorm - betas) are set to 0, so the layer should just multiply the input values by the weights, and that's about it. Now, from what I understand about batchnorm, during training it calculates the means and the variances for the layer inputs based on the minibatch it is being fed, and then does this to the input: output = (input - mean) / sqrt(variance + eps).
So, if all the input values of your minibatch are the same, then during training batchnorm will subtract the mean (equal to each value) from the input value, so the net should output 0, regardless of input, right?
And, it doesn't. In fact, it looks like all the means during calculation are 0, and the variances are 1 as if it is using the running averages of those values. So, either I don't understand how batchnorm works or batchnorm is just used incorrectly. Here is how it is initialized in the code I'm using:
layer= tflearn.fully_connected(layer, 10, weights_init=w_init)
layer= tflearn.layers.normalization.batch_normalization(layer)
layer= tflearn.activations.leaky_relu(layer)
The other option is that it is used incorrectly during training, but I would like to eliminate the other possible explanations first.
解决方案
The TensorFlow batch norm implementation has some update ops that are not included in the training op's dependencies by default. You have to add the dependencies explicitly. Quoting the docs:
[W]hen training, the
moving_mean
andmoving_variance
need to be updated. By default the update ops are placed intf.GraphKeys.UPDATE_OPS
, so they need to be added as a dependency to thetrain_op
. Also, be sure to add anybatch_normalization
ops before getting theupdate_ops
collection. Otherwise,update_ops
will be empty, and training/inference will not work properly. For example:
x_norm = tf.layers.batch_normalization(x, training=training)
# ...
update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
with tf.control_dependencies(update_ops):
train_op = optimizer.minimize(loss)
推荐阅读
- ios - Stack View 中的自定义 Facebook 按钮
- flutter - 使用 Elasticsearch 颤振
- docker - 运行 multi-stacker docker built docker image 失败 – Nexts 找不到 React 模块
- angular - 尝试运行我的 Angular Schematics 时出错
- macos - SocketException:在 macOS 上使用颤振应用程序连接失败(操作系统错误:不允许操作,errno = 1)
- c++ - 我怎样才能摆脱这种额外的间接级别?
- python - 在 Django 中访问一对一的表字段值
- visual-studio-code - VS Code 自动格式化奇怪的包装行为
- javascript - 传递给 onClick 的函数没有完全执行
- c++ - 如何在父目录中编写带有标题的 Makefile(用于 c++ 项目)?