首页 > 解决方案 > NLP:如何使用 GoldParse 对象训练 spaCy NER 模型

问题描述

我正在尝试使用 GoldParse 对象训练 spaCy NER 模型。这就是我所做的:

向 NER 模型添加额外的标签

add_ents = ['A1', 'B1', 'C1', 'D1', 'E1', 'F1', 'G1'] # sample labels

# Create a pipe if it does not exist
if "ner" not in nlp.pipe_names:
    ner = nlp.create_pipe("ner") 
    nlp.add_pipe(ner)
else:
    ner = nlp.get_pipe("ner")

for e in add_ents:
    ner.add_label(e)

训练 NER 模型

other_pipes = [pipe for pipe in nlp.pipe_names if pipe != "ner"]
model = None # Since we training a fresh model not a saved model
with nlp.disable_pipes(*other_pipes):  # only train ner
    if model is None:
        optimizer = nlp.begin_training()
    else:
        optimizer = nlp.resume_training()
    for i in range(20):
        loss = {}
        nlp.update(X, y,  sgd=optimizer, drop=0.0, losses=loss)
        print("Loss: ", loss)

这里 X 是 Doc 对象的列表, y 是对应的 GoldParse 对象的列表。执行时我遇到以下错误:

nn_parser.pyx in spacy.syntax.nn_parser.Parser.update()

nn_parser.pyx in spacy.syntax.nn_parser.Parser._init_gold_batch()

ner.pyx in spacy.syntax.ner.BiluoPushDown.preprocess_gold()

ner.pyx in spacy.syntax.ner.BiluoPushDown.lookup_transition()

ValueError: 'A1' is not in list

我尝试搜索解决方案,但找不到任何相关内容。有没有办法解决这个问题?

标签: pandasnlpspacynamed-entity-recognition

解决方案


推荐阅读