python - 新任务的 RoBERTa 微调中的 from_logits=True 与 from_logits=False
问题描述
我正在训练/微调一个西班牙 RoBERTa 模型,该模型最近已针对除文本分类之外的各种 NLP 任务进行了预训练。
由于基线模型似乎很有希望,我想针对不同的任务对其进行微调:文本分类,更准确地说,是西班牙推文的情感分析。
我有一个很好的选择西班牙,我可以用来微调的标签推文。
预处理没有任何问题。但是,当我训练模型时,它不会提高,即准确率不会上升。
代码:
我将省略预处理部分,因为我认为似乎没有问题。
# Training with native TensorFlow
from transformers import TFAutoModelForSequenceClassification
## Model Definition
model = TFAutoModelForSequenceClassification.from_pretrained("BSC-TeMU/roberta-base-bne", from_pt=True, num_labels=3)
## Model Compilation
optimizer = tf.keras.optimizers.Adam(learning_rate=5e-5)
loss = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=False)
metric = tf.metrics.SparseCategoricalAccuracy()
model.compile(optimizer=optimizer,
loss=loss,
metrics=metric)
## Fitting the data
history = model.fit(train_dataset.shuffle(1000).batch(64), epochs=3, batch_size=64)
输出:
为此,loss = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=False)
我得到:
/usr/local/lib/python3.7/dist-packages/transformers/configuration_utils.py:337: UserWarning: Passing `gradient_checkpointing` to a config initialization is deprecated and will be removed in v5 Transformers. Using `model.gradient_checkpointing_enable()` instead, or if you are using the `Trainer` API, pass `gradient_checkpointing=True` in your `TrainingArguments`.
"Passing `gradient_checkpointing` to a config initialization is deprecated and will be removed in v5 "
Some weights of the PyTorch model were not used when initializing the TF 2.0 model TFRobertaForSequenceClassification: ['roberta.embeddings.position_ids']
- This IS expected if you are initializing TFRobertaForSequenceClassification from a PyTorch model trained on another task or with another architecture (e.g. initializing a TFBertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing TFRobertaForSequenceClassification from a PyTorch model that you expect to be exactly identical (e.g. initializing a TFBertForSequenceClassification model from a BertForSequenceClassification model).
Some weights or buffers of the TF 2.0 model TFRobertaForSequenceClassification were not initialized from the PyTorch model and are newly initialized: ['classifier.dense.weight', 'classifier.dense.bias', 'classifier.out_proj.weight', 'classifier.out_proj.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Epoch 1/5
16/16 [==============================] - 34s 1s/step - loss: 1.0986 - sparse_categorical_accuracy: 0.2734
Epoch 2/5
16/16 [==============================] - 17s 1s/step - loss: 1.0986 - sparse_categorical_accuracy: 0.2734
Epoch 3/5
16/16 [==============================] - 17s 1s/step - loss: 1.0986 - sparse_categorical_accuracy: 0.2734
Epoch 4/5
16/16 [==============================] - 17s 1s/step - loss: 1.0986 - sparse_categorical_accuracy: 0.2734
Epoch 5/5
16/16 [==============================] - 17s 1s/step - loss: 1.0986 - sparse_categorical_accuracy: 0.2734
但是,当我使用loss = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=False)
时,我得到:
/usr/local/lib/python3.7/dist-packages/transformers/configuration_utils.py:337: UserWarning: Passing `gradient_checkpointing` to a config initialization is deprecated and will be removed in v5 Transformers. Using `model.gradient_checkpointing_enable()` instead, or if you are using the `Trainer` API, pass `gradient_checkpointing=True` in your `TrainingArguments`.
"Passing `gradient_checkpointing` to a config initialization is deprecated and will be removed in v5 "
Some weights of the PyTorch model were not used when initializing the TF 2.0 model TFRobertaForSequenceClassification: ['roberta.embeddings.position_ids']
- This IS expected if you are initializing TFRobertaForSequenceClassification from a PyTorch model trained on another task or with another architecture (e.g. initializing a TFBertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing TFRobertaForSequenceClassification from a PyTorch model that you expect to be exactly identical (e.g. initializing a TFBertForSequenceClassification model from a BertForSequenceClassification model).
Some weights or buffers of the TF 2.0 model TFRobertaForSequenceClassification were not initialized from the PyTorch model and are newly initialized: ['classifier.dense.weight', 'classifier.dense.bias', 'classifier.out_proj.weight', 'classifier.out_proj.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Epoch 1/5
16/16 [==============================] - 35s 1s/step - loss: 1.0184 - sparse_categorical_accuracy: 0.4778
Epoch 2/5
16/16 [==============================] - 18s 1s/step - loss: 0.6688 - sparse_categorical_accuracy: 0.7394
Epoch 3/5
16/16 [==============================] - 18s 1s/step - loss: 0.3270 - sparse_categorical_accuracy: 0.8845
Epoch 4/5
16/16 [==============================] - 18s 1s/step - loss: 0.1200 - sparse_categorical_accuracy: 0.9654
Epoch 5/5
16/16 [==============================] - 18s 1s/step - loss: 0.0500 - sparse_categorical_accuracy: 0.9872
问题:
我应该使用哪一个,即哪一个是正确的,为什么?
另外,我想知道我是否使用了正确的指标和损失。
解决方案
推荐阅读
- android - `getResources().getString` 生成异常
- css - 单击时如何使此树形菜单处于活动状态(已选择)
- php - 致命错误:未捕获的错误:调用未定义的函数 getPro()
- php - phpfastcache V6 + redis - 可以设置前缀吗?
- typescript - 如何使用 NPM 引用 office JS 的 typescript defenition 的 beta 版本?
- swift - MTLCreateSystemDefaultDevice() 在 iOS 10.3 的 iPad 上返回 nil
- rsyslog - rsyslog - 如何使用 JSON 模板发送多行消息
- javascript - Python - Selenium - 无法在页面上找到元素
- sql - Excel SQL 连接:根据单元格值进行查询更新
- mips - MIPS bgt 指令