tensorflow - Keras BinaryCrossentropy loss 为两个向量之间的角距离提供 NaN
问题描述
我想训练一个 siamese-LSTM,使得如果相应标签为 0,则两个输出的角距离为 1(低相似度),如果标签为 1,则为 0(高相似度)。
我从这里取了角距离公式:https ://en.wikipedia.org/wiki/Cosine_similarity
这是我的模型代码:
# inputs are unicode encoded int arrays from strings
# similar string should yield low angular distance
left_input = tf.keras.layers.Input(shape=[None, 1], dtype='float32')
right_input = tf.keras.layers.Input(shape=[None, 1], dtype='float32')
lstm = tf.keras.layers.LSTM(10)
left_embedding = lstm(left_input)
right_embedding = lstm(right_input)
# cosine_layer is the operation to get cosine similarity
cosine_layer = tf.keras.layers.Dot(axes=1, normalize=True)
cosine_similarity = cosine_layer([left_embedding, right_embedding])
# next two lines calculate angular distance but with inversed labels
arccos = tf.math.acos(cosine_similarity)
angular_distance = arccos / math.pi # not 1. - (arccos / math.pi)
model = tf.keras.Model([left_input, right_input], [angular_distance])
model.compile(loss='binary_crossentropy', optimizer='sgd')
print(model.summary())
模型摘要对我来说看起来不错,在使用固定输入值进行测试时,我得到了余弦相似度等的正确值:
Model: "model_37"
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
input_95 (InputLayer) [(None, None, 1)] 0
__________________________________________________________________________________________________
input_96 (InputLayer) [(None, None, 1)] 0
__________________________________________________________________________________________________
lstm_47 (LSTM) (None, 10) 480 input_95[0][0]
input_96[0][0]
__________________________________________________________________________________________________
dot_47 (Dot) (None, 1) 0 lstm_47[0][0]
lstm_47[1][0]
__________________________________________________________________________________________________
tf_op_layer_Acos_52 (TensorFlow [(None, 1)] 0 dot_47[0][0]
__________________________________________________________________________________________________
tf_op_layer_truediv_37 (TensorF [(None, 1)] 0 tf_op_layer_Acos_52[0][0]
__________________________________________________________________________________________________
tf_op_layer_sub_20 (TensorFlowO [(None, 1)] 0 tf_op_layer_truediv_37[0][0]
__________________________________________________________________________________________________
tf_op_layer_sub_21 (TensorFlowO [(None, 1)] 0 tf_op_layer_sub_20[0][0]
__________________________________________________________________________________________________
tf_op_layer_Abs (TensorFlowOpLa [(None, 1)] 0 tf_op_layer_sub_21[0][0]
==================================================================================================
Total params: 480
Trainable params: 480
Non-trainable params: 0
__________________________________________________________________________________________________
None
但是在训练时我总是会失去 NaN
model.fit([np.array(x_left_train), np.array(x_right_train)], np.array(y_train).reshape((-1,1)), batch_size=1, epochs=2, validation_split=0.1)
Train on 14400 samples, validate on 1600 samples
Epoch 1/2
673/14400 [>.............................] - ETA: 5:42 - loss: nan
这不是获取两个向量之间的相似性并训练我的网络生成这些向量的正确方法吗?
解决方案
二进制交叉熵计算log(output)
和log(1-output)
。这意味着您的输出需要严格大于 0 且严格小于 1,否则您将计算log
出负数的NaN
. (注意:log(0)
应该给你-inf
哪个不如NaN
,但仍然不可取)
从数学上讲,您的输出应该在正确的区间内,但是由于浮点运算的不准确性,我可以很好地想象这是您的问题。然而,这只是一个猜测。
因此,请尝试强制您的输出大于 0 且小于 1,例如通过使用clip
小 epsilon:
angular_distance = tf.keras.backend.clip(angular_distance, 1e-6, 1 - 1e-6)
推荐阅读
- openedge - 如何根据动态查询删除表中的记录?
- string - 连接从 bash 中的函数返回的字符串
- regex - 正则表达式:3 位递增顺序和相同的数字
- ruby-on-rails - ActionController::RoutingError (没有路由匹配 [GET] "/assets/images/logo.png"):
- javascript - Javascript 用来自 Array 的子字符串替换字符串
- reporting-services - ssrs:如何从左到右替换垂直轴(Y)
- composer-php - 自动创建 Composer 命令以将 TYPO3 升级到下一个主要版本
- arrays - SAS 模拟随机变量
- xamarin - 使用在异步方法中打开新页面的代码,我是否需要在主线程上运行它,如果需要,我该怎么做?
- python - 查看R中的代码,基于特定名称从网站下载图片的问题