首页 > 解决方案 > distillbert ktrain '解包的值太多'

问题描述

我正在尝试在 Colab 中使用 ktrain 运行 DistilBert,但我收到“错误太多值无法解包”。我正在尝试执行有毒评论分类,我从 CivilComments 上传了“train.csv”,我可以运行 BERT 但不能运行 DistilBert

#prerequisites:
!pip install ktrain
import ktrain
from ktrain import text as txt
DATA_PATH = '/content/train.csv'
NUM_WORDS = 50000 
MAXLEN = 150 
label_columns = ["toxic", "severe_toxic", "obscene", 
                 "threat", "insult", "identity_hate"]

如果我只使用“bert”进行预处理,它就可以正常工作,但是我不能使用 distilbert 模型。使用 distilbert 进行预处理时出现错误:

 (x_test, y_test), preproc = txt.texts_from_csv(DATA_PATH, 'comment_text', label_columns=label_columns, val_filepath=None, max_features=NUM_WORDS, maxlen=MAXLEN,  preprocess_mode='distilbert')

'要解压的值太多,预计为 2',如果我用 bert 替换 distilbert 它可以正常工作(下面的代码),但是我被迫使用 bert 作为模型,使用 bert 进行预处理可以正常工作:

(x_train, y_train), (x_test, y_test), preproc = txt.texts_from_csv(DATA_PATH, 'comment_text', label_columns=label_columns, val_filepath=None, max_features=NUM_WORDS, maxlen=MAXLEN,  preprocess_mode='bert')

这个没有错误,但我不能使用 distilbert,见下文:

示例:model = txt.text_classifier('distilbert', train_data=(x_train, y_train), preproc=preproc) 错误消息:if 'bert' is selected model, then preprocess_mode='bert' should be used and vice versa

我想 (x_test, y_test), preproc = txt.texts_from_csv(DATA_PATH, 'comment_text', label_columns=label_columns, val_filepath=None, max_features=NUM_WORDS, maxlen=MAXLEN, preprocess_mode='distilbert')与 distillbert 模型一起使用,如何避免错误“太多值无法解包”

代码所基于的链接:Arun Maiya (2019)。ktrain:用于帮助训练神经网络的 Keras 轻量级包装器。https://towardsdatascience.com/ktrain-a-lightweight-wrapper-for-keras-to-help-train-neural-networks-82851ba889c

标签: pythonnlpmultilabel-classificationdistilbertktrain

解决方案


本示例笔记本所示,函数在指定为模型时texts_from_*返回TransformerDataset对象(不是 NumpyArrays) 。preprocess_mode='distilbert'所以,你需要做这样的事情:

trn, val, preproc = txt.texts_from_csv(DATA_PATH, 'comment_text', label_columns=label_columns, val_filepath=None, max_features=NUM_WORDS, maxlen=MAXLEN,  preprocess_mode='distilbert')



推荐阅读