首页 > 解决方案 > 训练具有多个公共输出的 tensorflow 模型

问题描述

我正在尝试训练具有多个相同输出的 Tensorflow 模型。循环神经网络预测图像标签的迭代时间步长。每个时间步都应该预测图像类别,并且更深的时间步有望提高性能。您可以简单地想象一个具有中间输出的深度模型,该模型可以预测一项常见任务。所有输出都与相同的标签进行比较。

问题是输出太多。所有输出都已成功编译,但我认为我将平铺数据提供给模型的方法不是解决我的问题的最佳解决方案。我如何在内存、干净的代码方面改进这一点?

CHO_RNN=CustomLayers.RNN_Decoder(units,...)
prediction_list=[]

for x in range(settings['refinement_t']):
  pred, hidden = CHO_RNN(features, hidden)
  prediction_list.append(pred)

model=tf.keras.models.Model(input_image, prediction_list)
...

data={'output1':labels,'output2':labels,'output3':labels,...'output15':labels}
model.fit(x=images,y=data,...)

编辑:我想要实现这一目标的另一个原因是因为在推理时,模型输出非常复杂,并且用于评估 ex) model.evaluate 的函数不能很好地工作。此外,详细的打印和张量板图非常长。

Edit2:RNN解码器代码

class BahdanauAttention(tf.keras.Model):
  def __init__(self, units):
    super(BahdanauAttention, self).__init__()
    self.W1 = tf.keras.layers.Dense(units)
    self.W2 = tf.keras.layers.Dense(units)
    self.V = tf.keras.layers.Dense(1)

  def call(self, features, hidden):
    # features(CNN_encoder output) shape == (batch_size, 36, embedding_dim)

    # hidden shape == (batch_size, hidden_size)
    # hidden_with_time_axis shape == (batch_size, 1, hidden_size)
    hidden_with_time_axis = tf.expand_dims(hidden, 1)

    # attention_hidden_layer shape == (batch_size, 36, units)
    attention_hidden_layer = (tf.nn.tanh(self.W1(features) +
                                         self.W2(hidden_with_time_axis)))

    # score shape == (batch_size, 36, 1)
    # This gives you an unnormalized score for each image feature.
    score = self.V(attention_hidden_layer)

    # attention_weights shape == (batch_size, 36, 1)
    attention_weights = tf.nn.softmax(score, axis=1)

    # context_vector shape after sum == (batch_size, hidden_size)
    context_vector = attention_weights * features
    context_vector = tf.reduce_sum(context_vector, axis=1)

    return context_vector, attention_weights

class RNN_Decoder(tf.keras.Model):
    def __init__(self, units, class_types):
        #units: # classes
        super(RNN_Decoder, self).__init__()
        self.units = units
        self.gru = tf.keras.layers.GRU(self.units,
                                        return_sequences=True,
                                        return_state=True,
                                        recurrent_initializer='glorot_uniform')
        self.fc1 = tf.keras.layers.Dense(self.units)
        self.fc2 = tf.keras.layers.Dense(class_types)

        self.attention = BahdanauAttention(self.units)

    def call(self, features, hidden):
        # hidden: previous states   features: feature map(conv output)
        # defining attention as a separate model

        context_vector, attention_weights = self.attention(features, hidden)

            # x shape after concatenation == (batch_size, 1, embedding_dim + hidden_size)
        x = tf.expand_dims(context_vector, 1)

        # passing the concatenated vector to the GRU
        output, state = self.gru(x)

        # shape == (batch_size, max_length, hidden_size)
        x = self.fc1(output)

        # x shape == (batch_size * max_length, hidden_size)
        x = tf.reshape(x, (-1, x.shape[2]))

        # output shape == (batch_size * max_length, class_types)
        x = self.fc2(x)

        return x, state, attention_weights

编辑 3:我设法通过定义自定义损失函数部分解决了这个问题,因此模型仍然输出很多向量,但可以在不平铺数据的情况下对其进行训练。

标签: pythontensorflowkerasdeep-learning

解决方案


推荐阅读