首页 > 解决方案 > 使用 SpaCy 的自定义 NER 训练模型无法训练

问题描述

我在Youtube 上遵循了这个教程

这是我的 jupyter 笔记本中的全部代码

import spacy
import fitz
import pickle
import pandas as pd
import random

train_data = pickle.load(open('train_data.pkl', 'rb'))
train_data[0]

train_data[0] 的输出显示在这里

nlp = spacy.blank('en')

def train_model(train_data):
    if 'ner' not in nlp.pipe_names:
        ner = nlp.create_pipe('ner')
        nlp.add_pipe(ner, last = True)
        
    for _, annotation in train_data:
        for ent in annotation['entities']:
            ner.add_label(ent[2])
            
            
    
    other_pipes = [pipe for pipe in nlp.pipe_names if pipe != 'ner']
    with nlp.disable_pipes(*other_pipes):
        optimizer = nlp.begin_training()
        for itn in range(10):
            print('Starting iteration' + str(itn))
            random.shuffle(train_data)
            losses = {}
            index = 0
            # batch up the examples using spaCy's minibatch
            #batches = minibatch(TRAIN_DATA, size=compounding(4.0, 32.0, 1.001))
            for text, annotations in train_data:
                try:
                
                    nlp.update(
                        [texts],  # batch of texts
                        [annotations],# batch of annotations
                        sgd=optimizer,
                        drop=0.5,  # dropout - make it harder to memorise data
                        losses=losses)
                except Exception as e:
                    pass
            print("Losses", losses)

train_model(train_data)

奇怪的是函数的输出是:

开始迭代0

损失{}

开始迭代1

损失{}

开始迭代2

损失{}

开始迭代3

损失{}

开始迭代4

损失{}

开始迭代5

损失{}

开始迭代6

损失{}

开始迭代7

损失{}

开始迭代8

损失{}

开始迭代9

损失{}

即使我可以运行 train_data 并获得输出,似乎根本没有数据进入模型!

spaCy 2.3.0 版
Python 3.7.3 版

标签: spacynamed-entity-recognition

解决方案


对于text, train_data 中的注释:

try:
    nlp.update(
        [***texts***],  # batch of texts
        [annotations],# batch of annotations
        sgd=optimizer,
        drop=0.5,  # dropout - make it harder to memorise data
        losses=losses).-----

文本替换文本


推荐阅读