首页 > 解决方案 > 尝试在自定义 TF 网络中使用 huggingface TF longformer 变压器时出现 Tensorflow“索引超出范围”错误

问题描述

我正在尝试将 longformer 的 Transformer TF 模型从 huggingface 调整为更大的三类分类模型,我已经编译了模型,但我无法在其上运行测试示例。模型和尝试输出如下:

import tensorflow as tf
import tensorflow.keras as keras
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.models import Model
from tensorflow.keras.layers import GaussianNoise,LocallyConnected2D,LocallyConnected1D,Input,MaxPooling1D,Dense,Dropout,BatchNormalization,LSTM,GRU,ConvLSTM2D,Flatten,LayerNormalization,TimeDistributed,Conv1D,Reshape,Masking
from tensorflow.keras import backend as K
import pathlib 
from tensorflow.keras.callbacks import Callback
from tensorflow.keras import regularizers,callbacks
import numpy as np
from tensorflow.keras.layers import Concatenate
from transformers import TFLongformerModel, LongformerTokenizer




if __name__ == "__main__":
    
    model_longformer = TFLongformerModel.from_pretrained('longformer-base-4096',output_hidden_states=True)
    print(model_longformer.summary())
    
    
    input_ids = tf.keras.Input(shape=(4096),dtype='int32')
    attention_mask = tf.keras.Input(shape=(4096), dtype='int32')
    opt=Adam()
    transformer = model_longformer([input_ids, attention_mask])   
    transformer_outputs = transformer[1] #sequence output
    print("Transformer output shape:")
    print(transformer_outputs.shape)
    #Grab the last 64 sequence entries, out of allegedly (,768). This is the bit 
    #that causes the error to mention the number '-63'
    hidden_states_size =  64 
    hiddes_states_ind = list(range(-hidden_states_size, 0, 1))
    selected_hidden_states = tf.keras.layers.concatenate(tuple([transformer_outputs[i] for i in hiddes_states_ind]))
    print(selected_hidden_states.shape)
    #array_hidden = np.asarray(selected_hiddes_states)
    #flatter_longformer_1 = Flatten(array_hidden)
    reshape_longformer_1 = Reshape((1,1,),input_shape=(49152,))(selected_hidden_states) #49152 = 64*768

    rnn_cells = [tf.keras.layers.GRUCell(64,dropout=0.5,recurrent_dropout=0.25,kernel_regularizer=regularizers.l2(0.005)),tf.keras.layers.GRUCell(64,kernel_regularizer=regularizers.l2(0.005),dropout=0,recurrent_dropout=0)]
    stacked_gru = tf.keras.layers.StackedRNNCells(rnn_cells)
    gru_layer = tf.keras.layers.RNN(stacked_gru)(reshape_longformer_1)
    bn_merge = BatchNormalization()(gru_layer)
    drop_merge = Dropout(0.1)(bn_merge)
    dense_1 = Dense(25,kernel_regularizer=regularizers.l2(0.0))(drop_merge) #0.015
    bn_dense_1 = BatchNormalization()(dense_1)
    drop_dense_1 = Dropout(0.1)(bn_dense_1)
    dense_final = Dense(3, activation = "softmax")(drop_dense_1)

    model = Model(inputs=[input_ids, attention_mask], outputs=dense_final)
    model.compile(loss="categorical_crossentropy", optimizer=opt)
    print(model.summary())
    text_input = "Queensland detectives are investigating the death of a man after he died in hospital yesterday.  9News understands an altercation took place between the man - who lives at a unit complex in the Brisbane suburb of Stafford - and a group of friends while they were drinking last week.  The altercation resulted in the man being stuck in the back of the head a number of times, with him then being rushed to hospital.  The man died from the injuries in hospital yesterday."



    tokenizer = LongformerTokenizer.from_pretrained("allenai/longformer-base-4096")
    encoded_input = tokenizer(text_input, return_tensors='tf',padding='max_length',max_length=4096)

    model([encoded_input['input_ids'],encoded_input['attention_mask']])

哪个输出:

Model: "tf_longformer_model"
_________________________________________________________________
Layer (type)                 Output Shape              Param #
=================================================================
longformer (TFLongformerMain multiple                  148659456
=================================================================
Total params: 148,659,456
Trainable params: 148,659,456
Non-trainable params: 0
_________________________________________________________________
None
WARNING:tensorflow:The parameters `output_attentions`, `output_hidden_states` and `use_cache` cannot be updated when calling a model.They have to be set to True/False in the config object (i.e.: `config=XConfig.from_pretrained('name', output_attentions=True)`).
WARNING:tensorflow:The parameter `return_dict` cannot be set in graph mode and will always be set to `True`.
WARNING:tensorflow:From C:\ProgramData\Anaconda3\envs\tf2\lib\site-packages\tensorflow\python\ops\array_ops.py:5041: calling gather (from tensorflow.python.ops.array_ops) with validate_indices is deprecated and will be removed in a future version.
Instructions for updating:
The `validate_indices` argument has no effect. Indices are always validated on CPU and never validated on GPU.
Transformer output shape:
(None, 768)
(49152,)
Model: "model"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to
==================================================================================================
input_1 (InputLayer)            [(None, 4096)]       0
__________________________________________________________________________________________________
input_2 (InputLayer)            [(None, 4096)]       0
__________________________________________________________________________________________________
tf_longformer_model (TFLongform TFLongformerBaseMode 148659456   input_1[0][0]
                                                                input_2[0][0]
__________________________________________________________________________________________________
tf.__operators__.getitem (Slici (768,)               0           tf_longformer_model[0][14]
__________________________________________________________________________________________________
tf.__operators__.getitem_1 (Sli (768,)               0           tf_longformer_model[0][14]
__________________________________________________________________________________________________
EDITED OUT ANOTHER 62 SIMILAR LAYERS
__________________________________________________________________________________________________
tf.__operators__.getitem_63 (Sl (768,)               0           tf_longformer_model[0][14]
__________________________________________________________________________________________________
concatenate (Concatenate)       (49152,)             0           tf.__operators__.getitem[0][0]
                                                                tf.__operators__.getitem_1[0][0]
EDITED ANOTHER 62 SIMILAR LINES
                                                                tf.__operators__.getitem_63[0][0]
__________________________________________________________________________________________________
reshape (Reshape)               (49152, 1, 1)        0           concatenate[0][0]
__________________________________________________________________________________________________
rnn (RNN)                       (49152, 64)          37824       reshape[0][0]
__________________________________________________________________________________________________
batch_normalization (BatchNorma (49152, 64)          256         rnn[0][0]
__________________________________________________________________________________________________
dropout_49 (Dropout)            (49152, 64)          0           batch_normalization[0][0]
__________________________________________________________________________________________________
dense (Dense)                   (49152, 25)          1625        dropout_49[0][0]
__________________________________________________________________________________________________
batch_normalization_1 (BatchNor (49152, 25)          100         dense[0][0]
__________________________________________________________________________________________________
dropout_50 (Dropout)            (49152, 25)          0           batch_normalization_1[0][0]
__________________________________________________________________________________________________
dense_1 (Dense)                 (49152, 3)           78          dropout_50[0][0]
==================================================================================================
Total params: 148,699,339
Trainable params: 148,699,161
Non-trainable params: 178
__________________________________________________________________________________________________
None
2021-04-29 08:53:45.368311: W tensorflow/core/framework/op_kernel.cc:1763] OP_REQUIRES failed at strided_slice_op.cc:108 : Invalid argument: slice index -63 of dimension 0 out of bounds.
Traceback (most recent call last):
 File "c:\Automator_alpha\Just_longformer.py", line 60, in <module>
   model([encoded_input['input_ids'],encoded_input['attention_mask']])
 File "C:\ProgramData\Anaconda3\envs\tf2\lib\site-packages\tensorflow\python\keras\engine\base_layer.py", line 1014, in __call__
   outputs = call_fn(inputs, *args, **kwargs)
 File "C:\ProgramData\Anaconda3\envs\tf2\lib\site-packages\tensorflow\python\keras\engine\functional.py", line 426, in call
   return self._run_internal_graph(
 File "C:\ProgramData\Anaconda3\envs\tf2\lib\site-packages\tensorflow\python\keras\engine\functional.py", line 562, in _run_internal_graph
   outputs = node.layer(*args, **kwargs)
 File "C:\ProgramData\Anaconda3\envs\tf2\lib\site-packages\tensorflow\python\keras\engine\base_layer.py", line 1014, in __call__
   outputs = call_fn(inputs, *args, **kwargs)
 File "C:\ProgramData\Anaconda3\envs\tf2\lib\site-packages\tensorflow\python\keras\layers\core.py", line 1520, in _call_wrapper
   return original_call(*new_args, **new_kwargs)
 File "C:\ProgramData\Anaconda3\envs\tf2\lib\site-packages\tensorflow\python\keras\layers\core.py", line 1326, in _call_wrapper
   return self._call_wrapper(*args, **kwargs)
 File "C:\ProgramData\Anaconda3\envs\tf2\lib\site-packages\tensorflow\python\keras\layers\core.py", line 1358, in _call_wrapper
   result = self.function(*args, **kwargs)
 File "C:\ProgramData\Anaconda3\envs\tf2\lib\site-packages\tensorflow\python\util\dispatch.py", line 206, in wrapper
   return target(*args, **kwargs)
 File "C:\ProgramData\Anaconda3\envs\tf2\lib\site-packages\tensorflow\python\ops\array_ops.py", line 1037, in _slice_helper
   return strided_slice(
 File "C:\ProgramData\Anaconda3\envs\tf2\lib\site-packages\tensorflow\python\util\dispatch.py", line 206, in wrapper
   return target(*args, **kwargs)
 File "C:\ProgramData\Anaconda3\envs\tf2\lib\site-packages\tensorflow\python\ops\array_ops.py", line 1210, in strided_slice
   op = gen_array_ops.strided_slice(
 File "C:\ProgramData\Anaconda3\envs\tf2\lib\site-packages\tensorflow\python\ops\gen_array_ops.py", line 10484, in strided_slice
   _ops.raise_from_not_ok_status(e, name)
 File "C:\ProgramData\Anaconda3\envs\tf2\lib\site-packages\tensorflow\python\framework\ops.py", line 6868, in raise_from_not_ok_status
   six.raise_from(core._status_to_exception(e.code, message), None)
 File "<string>", line 3, in raise_from
tensorflow.python.framework.errors_impl.InvalidArgumentError: slice index -63 of dimension 0 out of bounds. [Op:StridedSlice] name: model/tf.__operators__.getitem/strided_slice/

我将 4096 用于输入层,因为这是 longformer 论文中指定的输入长度。我尝试使用不同的值,而不是 64,我尝试在不指定索引的情况下迭代值(使用 for 语句,其中错误表示无法迭代不知道第一个维度)。

我对此很陌生,感觉我缺少一些基本的东西。

标签: tensorflowneural-networknlphuggingface-transformershuggingface-tokenizers

解决方案


推荐阅读