首页 > 解决方案 > 在带有 Keras 的 Tensorflow 2.3 中使用具有多个嵌入输入的 GPU 时无法分配设备进行操作

问题描述

我在 Tensorflow 1.X 中发现了一些与此问题相关的问题,但在使用 Keras 的 Tensorflow 2.X 中没有。

使用单个嵌入功能时,一切正常,但如果我添加多个,我在使用 GPU 运行时开始出现托管错误。只需一个 CPU,一切都可以正常工作。

任何解决方法的想法?我将代码简化为以下最小示例:

import tensorflow as tf

def create_model():
  test_input = tf.keras.Input(shape=(None,), dtype='string', name='test')
  test2_input = tf.keras.Input(shape=(None,), dtype='string', name='test2')
  feature_layer_inputs = {}
  feature_layer_inputs['test'] = test_input
  feature_layer_inputs['test2'] = test2_input

  vocab_list = ['This', 'That', 'Thing']
  feature_col = tf.feature_column.categorical_column_with_vocabulary_list(
      key='test', vocabulary_list=vocab_list,
      num_oov_buckets=0)
  embed_col = tf.feature_column.embedding_column(
      categorical_column=feature_col, dimension=4, combiner='mean')
  first_embed_layer = tf.keras.layers.DenseFeatures(
      feature_columns=[embed_col], name="first_embed_layer")

  second_vocab_list = ['a', 'b', 'c']
  feature_col_two = tf.feature_column.categorical_column_with_vocabulary_list(
      key='test2', vocabulary_list=second_vocab_list,
      num_oov_buckets=0)
  embed_col_two = tf.feature_column.embedding_column(
      categorical_column=feature_col_two, dimension=4, combiner='mean')
  second_embed_layer = tf.keras.layers.DenseFeatures(
      feature_columns=[embed_col_two], name="second_embed_layer")
  
  x = first_embed_layer(feature_layer_inputs)
  y = second_embed_layer(feature_layer_inputs)
  x = tf.keras.layers.concatenate([x, y])
  
  hidden_layer = tf.keras.layers.Dense(units=35, use_bias=False,
      name="user-embeddings-layer")(x)

  model = tf.keras.Model(
    inputs=[v for v in feature_layer_inputs.values()],
    outputs=[hidden_layer]
  )

  model.compile(optimizer=tf.keras.optimizers.Adagrad(lr=.01),
                # loss=loss_func,
                loss="sparse_categorical_crossentropy",
                metrics=['accuracy'])
  return model

in_tensor = tf.constant(['This', 'That'])
other_tensor = tf.constant(['a', 'b'])

features = {
  'test': in_tensor,
  'test2': other_tensor,
}
y = tf.constant([1, 2])

model = create_model()
history = model.fit(x=features, y=y,
                    epochs=10, shuffle=False, 
                    batch_size=1,
                    verbose=1,
                    callbacks=[]

完整的错误是:

tensorflow.python.framework.errors_impl.InvalidArgumentError:无法为操作functional_1/first_embed_layer/test_embedding/test_embedding_weights/embedding_lookup_sparse/embedding_lookup分配设备:由于节点{{colocation_node functional_1/first_embed_layer/test_embedding/test_embedding_weights/无法满足明确的设备规范'' embedding_lookup_sparse/embedding_lookup}} 与一组需要不兼容设备“/job:localhost/replica:0/task:0/device:GPU:0”的节点位于同一位置。所有可用设备 [/job:localhost/replica:0/task:0/device:CPU:0, /job:localhost/replica:0/task:0/device:XLA_CPU:0, /job:localhost/replica:0 /task:0/device:XLA_GPU:0, /job:localhost/replica:0/task:0/device:GPU:0]。

标签: pythontensorflowmachine-learningkerasdeep-learning

解决方案


我找到了一种解决方法,即使用 Sequential 模型 API 而不是功能 API,并将我的两个嵌入列放入单个 DenseFeatures 层。

这不太理想,因为现在我必须按可变维度大小搜索以引用其中一个嵌入列的权重(它们似乎并不可靠地位于 layer.weights[index] 数组的同一位置,因为您添加更多嵌入列)。但是,至少它运行。

我也提交了以下 github 问题:https ://github.com/tensorflow/tensorflow/issues/42590

import tensorflow as tf

def create_model():
  # All models in this course are sequential.
  model = tf.keras.models.Sequential()

  vocab_list = ['This', 'That', 'Thing']
  feature_col = tf.feature_column.categorical_column_with_vocabulary_list(
      key='test', vocabulary_list=vocab_list,
      num_oov_buckets=0)
  embed_col = tf.feature_column.embedding_column(
      categorical_column=feature_col, dimension=4, combiner='mean')

  second_vocab_list = ['a', 'b', 'c']
  feature_col_two = tf.feature_column.categorical_column_with_vocabulary_list(
      key='test2', vocabulary_list=second_vocab_list,
      num_oov_buckets=0)
  embed_col_two = tf.feature_column.embedding_column(
      categorical_column=feature_col_two, dimension=8, combiner='mean')
  first_embed_layer = tf.keras.layers.DenseFeatures(
      feature_columns=[embed_col, embed_col_two], name="first_embed_layer")

  model.add(first_embed_layer)
  model.add(tf.keras.layers.Dense(units=35, use_bias=False,
      name="user-embeddings-layer"))
                           
  # Construct the layers into a model that TensorFlow can execute.  
  # Notice that the loss function for multi-class classification
  # is different than the loss function for binary classification.  
  model.compile(optimizer=tf.keras.optimizers.Adam(lr=.01),
                loss="sparse_categorical_crossentropy",
                metrics=['accuracy'])
  
  return model

in_tensor = tf.constant(['This', 'That'])
other_tensor = tf.constant(['a', 'b'])

features = {
  'test': in_tensor,
  'test2': other_tensor,
}
y = tf.constant([1, 2])

model = create_model()

history = model.fit(x=features, y=y,
                    epochs=10, shuffle=False, 
                    batch_size=1,
                    verbose=1,
                    callbacks=[]

推荐阅读