tensorflow - 尝试在自定义 TF 网络中使用 huggingface TF longformer 变压器时出现 Tensorflow“索引超出范围”错误
问题描述
我正在尝试将 longformer 的 Transformer TF 模型从 huggingface 调整为更大的三类分类模型,我已经编译了模型,但我无法在其上运行测试示例。模型和尝试输出如下:
import tensorflow as tf
import tensorflow.keras as keras
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.models import Model
from tensorflow.keras.layers import GaussianNoise,LocallyConnected2D,LocallyConnected1D,Input,MaxPooling1D,Dense,Dropout,BatchNormalization,LSTM,GRU,ConvLSTM2D,Flatten,LayerNormalization,TimeDistributed,Conv1D,Reshape,Masking
from tensorflow.keras import backend as K
import pathlib
from tensorflow.keras.callbacks import Callback
from tensorflow.keras import regularizers,callbacks
import numpy as np
from tensorflow.keras.layers import Concatenate
from transformers import TFLongformerModel, LongformerTokenizer
if __name__ == "__main__":
model_longformer = TFLongformerModel.from_pretrained('longformer-base-4096',output_hidden_states=True)
print(model_longformer.summary())
input_ids = tf.keras.Input(shape=(4096),dtype='int32')
attention_mask = tf.keras.Input(shape=(4096), dtype='int32')
opt=Adam()
transformer = model_longformer([input_ids, attention_mask])
transformer_outputs = transformer[1] #sequence output
print("Transformer output shape:")
print(transformer_outputs.shape)
#Grab the last 64 sequence entries, out of allegedly (,768). This is the bit
#that causes the error to mention the number '-63'
hidden_states_size = 64
hiddes_states_ind = list(range(-hidden_states_size, 0, 1))
selected_hidden_states = tf.keras.layers.concatenate(tuple([transformer_outputs[i] for i in hiddes_states_ind]))
print(selected_hidden_states.shape)
#array_hidden = np.asarray(selected_hiddes_states)
#flatter_longformer_1 = Flatten(array_hidden)
reshape_longformer_1 = Reshape((1,1,),input_shape=(49152,))(selected_hidden_states) #49152 = 64*768
rnn_cells = [tf.keras.layers.GRUCell(64,dropout=0.5,recurrent_dropout=0.25,kernel_regularizer=regularizers.l2(0.005)),tf.keras.layers.GRUCell(64,kernel_regularizer=regularizers.l2(0.005),dropout=0,recurrent_dropout=0)]
stacked_gru = tf.keras.layers.StackedRNNCells(rnn_cells)
gru_layer = tf.keras.layers.RNN(stacked_gru)(reshape_longformer_1)
bn_merge = BatchNormalization()(gru_layer)
drop_merge = Dropout(0.1)(bn_merge)
dense_1 = Dense(25,kernel_regularizer=regularizers.l2(0.0))(drop_merge) #0.015
bn_dense_1 = BatchNormalization()(dense_1)
drop_dense_1 = Dropout(0.1)(bn_dense_1)
dense_final = Dense(3, activation = "softmax")(drop_dense_1)
model = Model(inputs=[input_ids, attention_mask], outputs=dense_final)
model.compile(loss="categorical_crossentropy", optimizer=opt)
print(model.summary())
text_input = "Queensland detectives are investigating the death of a man after he died in hospital yesterday. 9News understands an altercation took place between the man - who lives at a unit complex in the Brisbane suburb of Stafford - and a group of friends while they were drinking last week. The altercation resulted in the man being stuck in the back of the head a number of times, with him then being rushed to hospital. The man died from the injuries in hospital yesterday."
tokenizer = LongformerTokenizer.from_pretrained("allenai/longformer-base-4096")
encoded_input = tokenizer(text_input, return_tensors='tf',padding='max_length',max_length=4096)
model([encoded_input['input_ids'],encoded_input['attention_mask']])
哪个输出:
Model: "tf_longformer_model"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
longformer (TFLongformerMain multiple 148659456
=================================================================
Total params: 148,659,456
Trainable params: 148,659,456
Non-trainable params: 0
_________________________________________________________________
None
WARNING:tensorflow:The parameters `output_attentions`, `output_hidden_states` and `use_cache` cannot be updated when calling a model.They have to be set to True/False in the config object (i.e.: `config=XConfig.from_pretrained('name', output_attentions=True)`).
WARNING:tensorflow:The parameter `return_dict` cannot be set in graph mode and will always be set to `True`.
WARNING:tensorflow:From C:\ProgramData\Anaconda3\envs\tf2\lib\site-packages\tensorflow\python\ops\array_ops.py:5041: calling gather (from tensorflow.python.ops.array_ops) with validate_indices is deprecated and will be removed in a future version.
Instructions for updating:
The `validate_indices` argument has no effect. Indices are always validated on CPU and never validated on GPU.
Transformer output shape:
(None, 768)
(49152,)
Model: "model"
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
input_1 (InputLayer) [(None, 4096)] 0
__________________________________________________________________________________________________
input_2 (InputLayer) [(None, 4096)] 0
__________________________________________________________________________________________________
tf_longformer_model (TFLongform TFLongformerBaseMode 148659456 input_1[0][0]
input_2[0][0]
__________________________________________________________________________________________________
tf.__operators__.getitem (Slici (768,) 0 tf_longformer_model[0][14]
__________________________________________________________________________________________________
tf.__operators__.getitem_1 (Sli (768,) 0 tf_longformer_model[0][14]
__________________________________________________________________________________________________
EDITED OUT ANOTHER 62 SIMILAR LAYERS
__________________________________________________________________________________________________
tf.__operators__.getitem_63 (Sl (768,) 0 tf_longformer_model[0][14]
__________________________________________________________________________________________________
concatenate (Concatenate) (49152,) 0 tf.__operators__.getitem[0][0]
tf.__operators__.getitem_1[0][0]
EDITED ANOTHER 62 SIMILAR LINES
tf.__operators__.getitem_63[0][0]
__________________________________________________________________________________________________
reshape (Reshape) (49152, 1, 1) 0 concatenate[0][0]
__________________________________________________________________________________________________
rnn (RNN) (49152, 64) 37824 reshape[0][0]
__________________________________________________________________________________________________
batch_normalization (BatchNorma (49152, 64) 256 rnn[0][0]
__________________________________________________________________________________________________
dropout_49 (Dropout) (49152, 64) 0 batch_normalization[0][0]
__________________________________________________________________________________________________
dense (Dense) (49152, 25) 1625 dropout_49[0][0]
__________________________________________________________________________________________________
batch_normalization_1 (BatchNor (49152, 25) 100 dense[0][0]
__________________________________________________________________________________________________
dropout_50 (Dropout) (49152, 25) 0 batch_normalization_1[0][0]
__________________________________________________________________________________________________
dense_1 (Dense) (49152, 3) 78 dropout_50[0][0]
==================================================================================================
Total params: 148,699,339
Trainable params: 148,699,161
Non-trainable params: 178
__________________________________________________________________________________________________
None
2021-04-29 08:53:45.368311: W tensorflow/core/framework/op_kernel.cc:1763] OP_REQUIRES failed at strided_slice_op.cc:108 : Invalid argument: slice index -63 of dimension 0 out of bounds.
Traceback (most recent call last):
File "c:\Automator_alpha\Just_longformer.py", line 60, in <module>
model([encoded_input['input_ids'],encoded_input['attention_mask']])
File "C:\ProgramData\Anaconda3\envs\tf2\lib\site-packages\tensorflow\python\keras\engine\base_layer.py", line 1014, in __call__
outputs = call_fn(inputs, *args, **kwargs)
File "C:\ProgramData\Anaconda3\envs\tf2\lib\site-packages\tensorflow\python\keras\engine\functional.py", line 426, in call
return self._run_internal_graph(
File "C:\ProgramData\Anaconda3\envs\tf2\lib\site-packages\tensorflow\python\keras\engine\functional.py", line 562, in _run_internal_graph
outputs = node.layer(*args, **kwargs)
File "C:\ProgramData\Anaconda3\envs\tf2\lib\site-packages\tensorflow\python\keras\engine\base_layer.py", line 1014, in __call__
outputs = call_fn(inputs, *args, **kwargs)
File "C:\ProgramData\Anaconda3\envs\tf2\lib\site-packages\tensorflow\python\keras\layers\core.py", line 1520, in _call_wrapper
return original_call(*new_args, **new_kwargs)
File "C:\ProgramData\Anaconda3\envs\tf2\lib\site-packages\tensorflow\python\keras\layers\core.py", line 1326, in _call_wrapper
return self._call_wrapper(*args, **kwargs)
File "C:\ProgramData\Anaconda3\envs\tf2\lib\site-packages\tensorflow\python\keras\layers\core.py", line 1358, in _call_wrapper
result = self.function(*args, **kwargs)
File "C:\ProgramData\Anaconda3\envs\tf2\lib\site-packages\tensorflow\python\util\dispatch.py", line 206, in wrapper
return target(*args, **kwargs)
File "C:\ProgramData\Anaconda3\envs\tf2\lib\site-packages\tensorflow\python\ops\array_ops.py", line 1037, in _slice_helper
return strided_slice(
File "C:\ProgramData\Anaconda3\envs\tf2\lib\site-packages\tensorflow\python\util\dispatch.py", line 206, in wrapper
return target(*args, **kwargs)
File "C:\ProgramData\Anaconda3\envs\tf2\lib\site-packages\tensorflow\python\ops\array_ops.py", line 1210, in strided_slice
op = gen_array_ops.strided_slice(
File "C:\ProgramData\Anaconda3\envs\tf2\lib\site-packages\tensorflow\python\ops\gen_array_ops.py", line 10484, in strided_slice
_ops.raise_from_not_ok_status(e, name)
File "C:\ProgramData\Anaconda3\envs\tf2\lib\site-packages\tensorflow\python\framework\ops.py", line 6868, in raise_from_not_ok_status
six.raise_from(core._status_to_exception(e.code, message), None)
File "<string>", line 3, in raise_from
tensorflow.python.framework.errors_impl.InvalidArgumentError: slice index -63 of dimension 0 out of bounds. [Op:StridedSlice] name: model/tf.__operators__.getitem/strided_slice/
我将 4096 用于输入层,因为这是 longformer 论文中指定的输入长度。我尝试使用不同的值,而不是 64,我尝试在不指定索引的情况下迭代值(使用 for 语句,其中错误表示无法迭代不知道第一个维度)。
我对此很陌生,感觉我缺少一些基本的东西。
解决方案
推荐阅读
- python - 取字典中多个多维数组的平均值
- python - 尝试授权credentials.json时出错:UserWarning:无法访问token.json:没有这样的文件或目录
- php - 用于 silverstripe 4 缓存的 Redis
- python-3.x - subprocess.run() 的参数是否应该扩展?
- pyside - 只有 Key_Tab 和 ShiftModifier 不适用于 PySide
- node.js - 是否可以从异步函数中向请求发送数据?(Node.js/Express.js)
- excel - 如何复制/复制 Outlook MailItem
- universe - 来自 UNIVERSE/U2/PICK 的数据加密
- javascript - “这个”可以不是一个对象吗?
- jquery - 通过jquery选择更改事件渲染mvc部分视图