首页 > 解决方案 > 编译失败:尝试编译图 get_loss_cond_1_true_88089_rewritten[] 时检测到不支持的操作

问题描述

尝试使用自定义 crf 损失函数时,我在 google colab TPU 上收到以下错误。我检查了https://cloud.google.com/tpu/docs/tensorflow-ops的 FakeParam 操作,看起来操作符在 Cloud TPU 上可用。

InvalidArgumentError: 9 root error(s) found. (0) Invalid argument: {{function_node __inference_train_function_104228}} Compilation failure: Detected unsupported operations when trying to compile graph get_loss_cond_1_true_88089_rewritten[] on XLA_TPU_JIT: FakeParam (No registered 'FakeParam' OpKernel for XLA_TPU_JIT devices compatible with node {{node get_loss/cond_1/FakeParam_15}} (OpKernel was found, but attributes didn't match) Requested Attributes: dtype=DT_VARIANT, shape=[]){{node get_loss/cond_1/FakeParam_15}} [[get_loss/cond_1]] TPU compilation failed [[tpu_compile_succeeded_assert/_12238515605435969423/_6]] [[tpu_compile_succeeded_assert/_12238515605435969423/_6/_279]] (1) Invalid argument: {{function_node __inference_train_function_104228}} Compilation failure: Detected unsupported operations when trying to compile graph get_loss_cond_1_true_88089_rewritten[] on XLA_TPU_JIT: FakeParam (No registered 'FakeParam' OpKernel for XLA_TPU_JIT devices compatible with node {{node get_loss/cond_1/FakeParam_15}} (OpKernel was found, but attributes didn't match) Requested Attributes: dtype=DT_VARIANT, shape=[]){{node get_loss/cond_1/FakeParam_15}} [[get_loss/cond_1]] TPU compilation failed [[tpu_compile_succeeded_assert/_12238515605435969423/_6]] [[tpu_compile_succeeded_assert/_12238515605435969423/_6/_223]] (2) Invalid argument: {{function_node __inference_train_function_104228}} Compilation failure: Detected unsupported operations when trying to compile graph get_loss_cond_1_true_88089_rewritten[] on XLA_TPU_JIT: FakeParam (No registered 'FakeParam' OpKernel for XLA_TPU_JIT devices compatible with node {{node get_loss/cond_1/FakeParam_15}} (OpKernel was found, but attributes didn't match) Requested Attributes: dtype=DT_VARIANT, shape=[]){{node get_loss/cond_1/FakeParam_15}} [[get_loss/cond_1]] TPU compilation failed [[tpu_compile_succeeded_assert/_12238515605435969423/_6]] [[tpu_compile_succeeded_assert/_12238515605435969423/_6/_265]] (3) Invalid argument: {{function_node __inference_train_function_104228}} Compilation failure: Detected unsupported operations when trying to compile graph get_loss_cond_1_true_88089_rewritten[] on XLA_TPU_JIT: FakeParam (No registered 'FakeParam' OpKernel for XLA_TPU_JIT devices compatible with node {{node get_loss/cond_1/FakeParam_15}} (OpKernel was found, but attributes didn't match) Requested Attributes: dtype=DT_VARIANT, shape=[]){{node get_loss/cond_1/FakeParam_15}} [[get_loss/cond_1]] TPU compilation failed [[tpu_compile_succeeded_assert/_12238515605435969423/_6]] [[tpu_compile_succeeded_assert/_12238515605435969423/_6/_251]] (4) Invalid argument: {{function_node __inference_train_function_104228}} Compilation failure: Detected unsupported operations when trying to compile graph get_loss_cond_1_true_88089_rewritten[] on XLA_TPU_JIT: FakeParam (No registered 'FakeParam' OpKernel for XLA_TPU_JIT devices compatible with node {{node get_loss/cond_1/FakeParam_15}} (OpKernel was found, but attributes didn't match) Requested Attributes: dtype=DT_VARIANT, shape=[ ... [truncated]

这是我的代码:

def make_model():
  input_ids_in = tf.keras.layers.Input(shape=(100,), name='input_token', dtype=tf.int32)
  input_mask_in = tf.keras.layers.Input(shape=(100,), name='input_mask', dtype=tf.int32)
  bert_model = TFAutoModel.from_pretrained("dbmdz/bert-base-turkish-cased")
  embedding_layer = bert_model(input_ids_in, attention_mask = input_mask_in)[0]
  model = tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(50,trainable=False,
                            return_sequences=True))(embedding_layer)
  model = tf.keras.layers.TimeDistributed(tf.keras.layers.Dense(len(labels_ner), activation="relu"))(model)
            
  crf = CRF(len(labels_ner))  # CRF layer
  out = crf(model)  # output
  model = Model([input_ids_in,input_mask_in], out)
  model.compile('adam', loss=crf.get_loss)

  print("Baseline/LSTM-CRF model built: ")
  return model 

with strategy.scope():
  model = make_model()
  model.fit(x_tr, np.argmax(y_tr,axis=-1) ,batch_size=32 ,epochs=5,verbose=1,validation_split = 0.1)

我使用了这个 tensorflow_addon crf.py 模块https://github.com/howl-anderson/addons/blob/feature/crf_layers/tensorflow_addons/layers/crf.py

谢谢

标签: tensorflowtf.kerastpu

解决方案


Looks likeFakeParam仅支持这些 dtypes: {bfloat16,bool,complex64,float,int32,int64,uint32,uint64},而不支持dtype=DT_VARIANT.

在 TF2 上启用自动外部编译应该可以解决此问题,请在某处添加此行: tf.config.set_soft_device_placement(True).


推荐阅读