python - 如何将 tf.layer_norm 转换为 julia?
问题描述
我正在尝试将 tensorflow 代码转换为 julia,但我不明白layer_norm
函数是如何工作的。
我正在尝试转换那部分:
output = tf.contrib.layers.layer_norm(attn_out + h, begin_norm_axis=-1,
scope='LayerNorm')
张量的形状attn_out + h
是( qlen,bsz,d_model )
,当我下载训练好的模型时,变量 gamma 和 beta 有 shape (d_model,)
。此外, 的输出nn.moments
应该有 shape [qlen,bsz,1]
。函数使用以下参数layer_norm
调用函数:batch_norm
add_arg_scope
def layer_norm(inputs,
center=True,
scale=True,
activation_fn=None,
reuse=None,
variables_collections=None,
outputs_collections=None,
trainable=True,
begin_norm_axis=1,
begin_params_axis=-1,
scope=None):
"""Adds a Layer Normalization layer.
Based on the paper:
"Layer Normalization"
Jimmy Lei Ba, Jamie Ryan Kiros, Geoffrey E. Hinton
https://arxiv.org/abs/1607.06450.
Can be used as a normalizer function for conv2d and fully_connected.
Given a tensor `inputs` of rank `R`, moments are calculated and normalization
is performed over axes `begin_norm_axis ... R - 1`. Scaling and centering,
if requested, is performed over axes `begin_params_axis .. R - 1`.
By default, `begin_norm_axis = 1` and `begin_params_axis = -1`,
meaning that normalization is performed over all but the first axis
(the `HWC` if `inputs` is `NHWC`), while the `beta` and `gamma` trainable
parameters are calculated for the rightmost axis (the `C` if `inputs` is
`NHWC`). Scaling and recentering is performed via broadcast of the
`beta` and `gamma` parameters with the normalized tensor.
The shapes of `beta` and `gamma` are `inputs.shape[begin_params_axis:]`,
and this part of the inputs' shape must be fully defined.
Args:
inputs: A tensor having rank `R`. The normalization is performed over axes
`begin_norm_axis ... R - 1` and centering and scaling parameters are
calculated over `begin_params_axis ... R - 1`.
center: If True, add offset of `beta` to normalized tensor. If False, `beta`
is ignored.
scale: If True, multiply by `gamma`. If False, `gamma` is not used. When the
next layer is linear (also e.g. `nn.relu`), this can be disabled since the
scaling can be done by the next layer.
activation_fn: Activation function, default set to None to skip it and
maintain a linear activation.
reuse: Whether or not the layer and its variables should be reused. To be
able to reuse the layer scope must be given.
variables_collections: Optional collections for the variables.
outputs_collections: Collections to add the outputs.
trainable: If `True` also add variables to the graph collection
`GraphKeys.TRAINABLE_VARIABLES` (see tf.Variable).
begin_norm_axis: The first normalization dimension: normalization will be
performed along dimensions `begin_norm_axis : rank(inputs)`
begin_params_axis: The first parameter (beta, gamma) dimension: scale and
centering parameters will have dimensions
`begin_params_axis : rank(inputs)` and will be broadcast with the
normalized inputs accordingly.
scope: Optional scope for `variable_scope`.
Returns:
A `Tensor` representing the output of the operation, having the same
shape and dtype as `inputs`.
Raises:
ValueError: If the rank of `inputs` is not known at graph build time,
or if `inputs.shape[begin_params_axis:]` is not fully defined at
graph build time.
"""
with variable_scope.variable_scope(
scope, 'LayerNorm', [inputs], reuse=reuse) as sc:
inputs = ops.convert_to_tensor(inputs)
inputs_shape = inputs.shape
inputs_rank = inputs_shape.ndims
if inputs_rank is None:
raise ValueError('Inputs %s has undefined rank.' % inputs.name)
dtype = inputs.dtype.base_dtype
if begin_norm_axis < 0:
begin_norm_axis = inputs_rank + begin_norm_axis
if begin_params_axis >= inputs_rank or begin_norm_axis >= inputs_rank:
raise ValueError('begin_params_axis (%d) and begin_norm_axis (%d) '
'must be < rank(inputs) (%d)' %
(begin_params_axis, begin_norm_axis, inputs_rank))
params_shape = inputs_shape[begin_params_axis:]
if not params_shape.is_fully_defined():
raise ValueError(
'Inputs %s: shape(inputs)[%s:] is not fully defined: %s' %
(inputs.name, begin_params_axis, inputs_shape))
# Allocate parameters for the beta and gamma of the normalization.
beta, gamma = None, None
if center:
beta_collections = utils.get_variable_collections(variables_collections,
'beta')
beta = variables.model_variable(
'beta',
shape=params_shape,
dtype=dtype,
initializer=init_ops.zeros_initializer(),
collections=beta_collections,
trainable=trainable)
if scale:
gamma_collections = utils.get_variable_collections(
variables_collections, 'gamma')
gamma = variables.model_variable(
'gamma',
shape=params_shape,
dtype=dtype,
initializer=init_ops.ones_initializer(),
collections=gamma_collections,
trainable=trainable)
# By default, compute the moments across all the dimensions except the one with index 0.
norm_axes = list(range(begin_norm_axis, inputs_rank))
mean, variance = nn.moments(inputs, norm_axes, keep_dims=True)
# Compute layer normalization using the batch_normalization function.
# Note that epsilon must be increased for float16 due to the limited
# representable range.
variance_epsilon = 1e-12 if dtype != dtypes.float16 else 1e-3
outputs = nn.batch_normalization(
inputs,
mean,
variance,
offset=beta,
scale=gamma,
variance_epsilon=variance_epsilon)
outputs.set_shape(inputs_shape)
if activation_fn is not None:
outputs = activation_fn(outputs)
return utils.collect_named_outputs(outputs_collections, sc.name, outputs)
然而,这些张量都不(gamma,beta,mean,variance)
满足 的输入限制batch_norm
:
@tf_export("nn.batch_normalization")
def batch_normalization(x,
mean,
variance,
offset,
scale,
variance_epsilon,
name=None):
r"""Batch normalization.
Normalizes a tensor by `mean` and `variance`, and applies (optionally) a
`scale` \\(\gamma\\) to it, as well as an `offset` \\(\beta\\):
\\(\frac{\gamma(x-\mu)}{\sigma}+\beta\\)
`mean`, `variance`, `offset` and `scale` are all expected to be of one of two
shapes:
* In all generality, they can have the same number of dimensions as the
input `x`, with identical sizes as `x` for the dimensions that are not
normalized over (the 'depth' dimension(s)), and dimension 1 for the
others which are being normalized over.
`mean` and `variance` in this case would typically be the outputs of
`tf.nn.moments(..., keep_dims=True)` during training, or running averages
thereof during inference.
* In the common case where the 'depth' dimension is the last dimension in
the input tensor `x`, they may be one dimensional tensors of the same
size as the 'depth' dimension.
This is the case for example for the common `[batch, depth]` layout of
fully-connected layers, and `[batch, height, width, depth]` for
convolutions.
`mean` and `variance` in this case would typically be the outputs of
`tf.nn.moments(..., keep_dims=False)` during training, or running averages
thereof during inference.
See Source: [Batch Normalization: Accelerating Deep Network Training by
Reducing Internal Covariate Shift; S. Ioffe, C. Szegedy]
(http://arxiv.org/abs/1502.03167).
Args:
x: Input `Tensor` of arbitrary dimensionality.
mean: A mean `Tensor`.
variance: A variance `Tensor`.
offset: An offset `Tensor`, often denoted \\(\beta\\) in equations, or
None. If present, will be added to the normalized tensor.
scale: A scale `Tensor`, often denoted \\(\gamma\\) in equations, or
`None`. If present, the scale is applied to the normalized tensor.
variance_epsilon: A small float number to avoid dividing by 0.
name: A name for this operation (optional).
Returns:
the normalized, scaled, offset tensor.
"""
with ops.name_scope(name, "batchnorm", [x, mean, variance, scale, offset]):
inv = math_ops.rsqrt(variance + variance_epsilon)
if scale is not None:
inv *= scale
# Note: tensorflow/contrib/quantize/python/fold_batch_norms.py depends on
# the precise order of ops that are generated by the expression below.
return x * math_ops.cast(inv, x.dtype) + math_ops.cast(
offset - mean * inv if offset is not None else -mean * inv, x.dtype)
我不明白它是如何工作的,也无法转换为 julia 代码。我对张量形状使用相同的顺序。如果 x[a,b,c]
在 tf 中具有[a,b,c]
形状,那么它在我的 julia 代码中也具有形状。
解决方案
Flux
在 Julia 中提供批处理规范化和层规范化(以及其他一些)(称为LayerNorm
and BatchNorm
。在您的问题中,您没有指定您在 Julia 中使用的库,但如果您使用的不是Flux
,那么代码可能对您自己的实现有所帮助见:https ://github.com/FluxML/Flux.jl/blob/master/src/layers/normalise.jl
如果您正在使用Knet
(我从未使用过),那么至少已经实现了批量标准化,我不确定其他标准化层。他们可能也在那里。
还请在 Julia 中发布您已经尝试过的内容。
推荐阅读
- android - Android 使用 Espresso 测试 onActivityResult 错误
- python-3.x - 如何正确使用 `cv2.putText` 在图像上绘制阿拉伯文字?(Python+OpenCV)
- jquery - 手动调整 amp-img 的大小
- exception - SNAPSHOT_SERIALIZATION_ERROR 重复的属性 JaVers
- sql - 程序内的 Redshift 美元报价
- google-cloud-platform - 创建数据过程集群时出错.. 对 cloud.projects.get 的权限被拒绝
- typescript - TypeScript:密钥索引签名仅适用于所有未定义的属性
- javascript - 如何使用 Mocha 测试 Vue.js 中的 DOM 更新?
- python - 如何使用不影响 Tkinter GUI 的特定“if”条件终止 Python 脚本?
- mysql - 在 Spring Boot 中运行测试时启动临时 mysql 容器