python - 训练具有多个公共输出的 tensorflow 模型
问题描述
我正在尝试训练具有多个相同输出的 Tensorflow 模型。循环神经网络预测图像标签的迭代时间步长。每个时间步都应该预测图像类别,并且更深的时间步有望提高性能。您可以简单地想象一个具有中间输出的深度模型,该模型可以预测一项常见任务。所有输出都与相同的标签进行比较。
问题是输出太多。所有输出都已成功编译,但我认为我将平铺数据提供给模型的方法不是解决我的问题的最佳解决方案。我如何在内存、干净的代码方面改进这一点?
CHO_RNN=CustomLayers.RNN_Decoder(units,...)
prediction_list=[]
for x in range(settings['refinement_t']):
pred, hidden = CHO_RNN(features, hidden)
prediction_list.append(pred)
model=tf.keras.models.Model(input_image, prediction_list)
...
data={'output1':labels,'output2':labels,'output3':labels,...'output15':labels}
model.fit(x=images,y=data,...)
编辑:我想要实现这一目标的另一个原因是因为在推理时,模型输出非常复杂,并且用于评估 ex) model.evaluate 的函数不能很好地工作。此外,详细的打印和张量板图非常长。
Edit2:RNN解码器代码
class BahdanauAttention(tf.keras.Model):
def __init__(self, units):
super(BahdanauAttention, self).__init__()
self.W1 = tf.keras.layers.Dense(units)
self.W2 = tf.keras.layers.Dense(units)
self.V = tf.keras.layers.Dense(1)
def call(self, features, hidden):
# features(CNN_encoder output) shape == (batch_size, 36, embedding_dim)
# hidden shape == (batch_size, hidden_size)
# hidden_with_time_axis shape == (batch_size, 1, hidden_size)
hidden_with_time_axis = tf.expand_dims(hidden, 1)
# attention_hidden_layer shape == (batch_size, 36, units)
attention_hidden_layer = (tf.nn.tanh(self.W1(features) +
self.W2(hidden_with_time_axis)))
# score shape == (batch_size, 36, 1)
# This gives you an unnormalized score for each image feature.
score = self.V(attention_hidden_layer)
# attention_weights shape == (batch_size, 36, 1)
attention_weights = tf.nn.softmax(score, axis=1)
# context_vector shape after sum == (batch_size, hidden_size)
context_vector = attention_weights * features
context_vector = tf.reduce_sum(context_vector, axis=1)
return context_vector, attention_weights
class RNN_Decoder(tf.keras.Model):
def __init__(self, units, class_types):
#units: # classes
super(RNN_Decoder, self).__init__()
self.units = units
self.gru = tf.keras.layers.GRU(self.units,
return_sequences=True,
return_state=True,
recurrent_initializer='glorot_uniform')
self.fc1 = tf.keras.layers.Dense(self.units)
self.fc2 = tf.keras.layers.Dense(class_types)
self.attention = BahdanauAttention(self.units)
def call(self, features, hidden):
# hidden: previous states features: feature map(conv output)
# defining attention as a separate model
context_vector, attention_weights = self.attention(features, hidden)
# x shape after concatenation == (batch_size, 1, embedding_dim + hidden_size)
x = tf.expand_dims(context_vector, 1)
# passing the concatenated vector to the GRU
output, state = self.gru(x)
# shape == (batch_size, max_length, hidden_size)
x = self.fc1(output)
# x shape == (batch_size * max_length, hidden_size)
x = tf.reshape(x, (-1, x.shape[2]))
# output shape == (batch_size * max_length, class_types)
x = self.fc2(x)
return x, state, attention_weights
编辑 3:我设法通过定义自定义损失函数部分解决了这个问题,因此模型仍然输出很多向量,但可以在不平铺数据的情况下对其进行训练。
解决方案
推荐阅读
- raspberry-pi - 树莓派/boot/cmdline.txt的原始文件内容
- powershell - 在 PowerShell 中创建 HMAC SHA256 哈希
- php - Laravel - 在不多次查询数据库的情况下获取一列的总和
- arrays - Kotlin - 将实时数据保存到数组中
- javascript - 创建具有不同时间长度的幻灯片
- java - Eclipse 市场 SunCertPathBuilderException:无法找到请求目标的有效认证路径
- android - Flutter:Android系统工具提示背景色
- postgresql - 未考虑 POSTGRES_USER
- javascript - 如何使用谷歌图表打破工具提示中的线条?
- python - 在python中使用pandas获取最频繁的值组