首页 > 解决方案 > keras 上的 ResourceExhaustedError

问题描述

我一直在尝试在 400000 条评论的亚马逊数据集上运行 keras 上的文本摘要解码器编码器模型

encoder_inputs = Input(shape=(summaries.shape[1], ), name='Encoder-Input')
inc_emb = Embedding(nb_words, embedding_dim, weights=[word_embedding_matrix], 
                    mask_zero=False, name='Body-Word-Embedding')(encoder_inputs)

x = BatchNormalization(name='Encoder-Batchnorm-1')(inc_emb)

_, state_h = GRU(embedding_dim, return_state=True, name='Encoder-Last-GRU')(x)
encoder_model = Model(inputs=encoder_inputs, outputs=state_h,
                      name='Encoder-Model')
seq2seq_encoder_out = encoder_model(encoder_inputs)

decoder_inputs = Input(shape=(None,), name="Decoder-Input")
dec_emb = Embedding(nb_words, embedding_dim, mask_zero=False,
                    weights=[word_embedding_matrix],
                    name="Decoder-Word-Embedding")(decoder_inputs)

dec_bn = BatchNormalization(name='Decoder-Batchnorm-1')(dec_emb)
decoder_gru = GRU(embedding_dim, return_state=True, return_sequences=True,
                  name="Decoder-GRU")
decoder_gru_output, _ = decoder_gru(dec_bn, initial_state=seq2seq_encoder_out)

x = BatchNormalization(name='Decoder-Batchnorm-2')(decoder_gru_output)
decoder_dense = Dense(nb_words, activation='softmax', name='Final-Output-Dense')
decoder_outputs = decoder_dense(x)

我收到以下资源耗尽错误

ResourceExhaustedError: OOM when allocating tensor with shape[190000,59301] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
     [[Node: Final-Output-Dense/MatMul = MatMul[T=DT_FLOAT, transpose_a=false, transpose_b=false, _device="/job:localhost/replica:0/task:0/device:GPU:0"](Final-Output-Dense/Reshape, Final-Output-Dense/Reshape_1)]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

     [[Node: loss/mul/_271 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_4239_loss/mul", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

谁能建议我的方法哪里出错了?

标签: kerasnlp

解决方案


这看起来对于您的 GPU 来说批量太大,请尝试处理较小的批量。如果您的批量大小已经是 1,您将不得不减小模型大小。


推荐阅读