首页 > 解决方案 > Siamese network,下半部分使用dense layer而不是euclidean distance layer

问题描述

对于连体网络来说,这是一个相当有趣的问题

我正在关注https://keras.io/examples/mnist_siamese/中的示例。我修改后的代码版本在这个google colab 中

siamese 网络接受 2 个输入(2 个手写数字)并输出它们是否具有相同的数字 (1) 或不同 (0)。

两个输入中的每一个首先由一个共享的 base_network 处理(3 个 Dense 层,中间有 2 个 Dropout 层)。input_a被提取到processed_a中,input_b被提取到processed_b中。

siamese 网络的最后一层是两个提取张量之间的欧几里德距离层:

distance = Lambda(euclidean_distance,
                  output_shape=eucl_dist_output_shape)([processed_a, processed_b])

model = Model([input_a, input_b], distance)

我理解在网络的下部使用欧几里德距离层的原因:如果特征提取得很好,那么相似的输入应该具有相似的特征。

我在想,为什么不在下部使用普通的 Dense 层,因为:

# distance = Lambda(euclidean_distance,
#                   output_shape=eucl_dist_output_shape)([processed_a, processed_b])

# model = Model([input_a, input_b], distance)

#my model
subtracted = Subtract()([processed_a, processed_b])
out = Dense(1, activation="sigmoid")(subtracted)
model = Model([input_a,input_b], out)

我的推理是,如果提取的特征相似,那么减法层应该产生一个小的张量,作为提取特征之间的差异。下一层,密集层,可以学习如果输入很小,输出 1,否则输出 0

因为欧几里得距离层在两个输入相似时输出接近 0 值,否则为 1,我还需要反转精度和损失函数,如:

# the version of loss and accuracy for Euclidean distance layer
# def contrastive_loss(y_true, y_pred):
#     '''Contrastive loss from Hadsell-et-al.'06
#     http://yann.lecun.com/exdb/publis/pdf/hadsell-chopra-lecun-06.pdf
#     '''
#     margin = 1
#     square_pred = K.square(y_pred)
#     margin_square = K.square(K.maximum(margin - y_pred, 0))
#     return K.mean(y_true * square_pred + (1 - y_true) * margin_square)

# def compute_accuracy(y_true, y_pred):
#     '''Compute classification accuracy with a fixed threshold on distances.
#     '''
#     pred = y_pred.ravel() < 0.5
#     return np.mean(pred == y_true)

# def accuracy(y_true, y_pred):
#     '''Compute classification accuracy with a fixed threshold on distances.
#     '''
#     return K.mean(K.equal(y_true, K.cast(y_pred < 0.5, y_true.dtype)))

### my version, loss and accuracy
def contrastive_loss(y_true, y_pred):
  margin = 1
  square_pred = K.square(y_pred)
  margin_square = K.square(K.maximum(margin - y_pred, 0))
#   return K.mean(y_true * square_pred + (1-y_true) * margin_square)
  return K.mean(y_true * margin_square + (1-y_true) * square_pred)

def compute_accuracy(y_true, y_pred):
  '''Compute classification accuracy with a fixed threshold on distances.
  '''
  pred = y_pred.ravel() > 0.5
  return np.mean(pred == y_true)

def accuracy(y_true, y_pred):
  '''Compute classification accuracy with a fixed threshold on distances.
  '''
  return K.mean(K.equal(y_true, K.cast(y_pred > 0.5, y_true.dtype)))

旧模型的准确度: * 训练集上的准确度: 99.55% * 测试集上的准确度: 97.42% 这种微小的变化会导致模型不学习任何东西: * 训练集上的准确度: 48.64% * 测试集上的准确度: 48.29 %

所以我的问题是:

1. 我在 Siamese 网络的下部使用 Substract + Dense 的推理有什么问题?

2. 我们可以解决这个问题吗?我有两个潜在的解决方案,但我不自信,(1)用于特征提取的卷积神经网络(2)连体网络下部的更密集层。

标签: pythontensorflowmachine-learningkerasneural-network

解决方案


In case of two similar examples, after subtracting two n-dimensional feature vector (extracted using common/base feature extraction model) you will get zero or around zero value in most of the location of resulting n-dimensional vector on which next/output Dense layer works. On the other hand, we all know that in a ANN model weights are learnt in such a way that less important features produce very less responses and prominent/interesting features contributing towards decision produce high responses. Now you can understand that our subtracted features vector is just in the opposite direction because when two examples are from different class then they produce high responses and opposite for examples from same class. Furthermore with a single node in the output layer (no additional hidden layer before output layer) its quite difficult to learn for model to generate high response from zero values when two samples are of same class. This might be an important point to solve your problem.

Based on the above discussion, you may want to try following ideas:

  • transforming subtracted feature vector to ensure when there is similarity you get high responses, may be by doing subtraction from 1 or reciprocal (multiplicative inverse) followed by normalization.
  • Adding more Dense layer before output layer.

I wont be surprised if convolutional neural net instead of stacked Dense layer for feature extraction (as you are thinking) does not improve your accuracy much as it's just another way of doing the same (feature extraction).


推荐阅读