首页 > 解决方案 > 获取找到的命名实体的开始和结束位置

问题描述

我对 ML 和 Spacy 都很陌生。我正在尝试从输入文本中显示命名实体。

这是我的方法:

def run():

    nlp = spacy.load('en_core_web_sm')
    sentence = "Hi my name is Oliver!"
    doc = nlp(sentence)

    #Threshold for the confidence socres.
    threshold = 0.2
    beams = nlp.entity.beam_parse(
        [doc], beam_width=16, beam_density=0.0001)

    entity_scores = defaultdict(float)
    for beam in beams:
        for score, ents in nlp.entity.moves.get_beam_parses(beam):
            for start, end, label in ents:
                entity_scores[(start, end, label)] += score

    #Create a dict to store output.
    ners = defaultdict(list)
    ners['text'] = str(sentence)

    for key in entity_scores:
        start, end, label = key
        score = entity_scores[key]
        if (score > threshold):
            ners['extractions'].append({
                "label": str(label),
                "text": str(doc[start:end]),
                "confidence": round(score, 2)
            })

    pprint(ners)

上述方法工作正常,将打印如下内容:

'extractions': [{'confidence': 1.0,
                'label': 'PERSON',
                'text': 'Oliver'}],
'text': 'Hi my name is Oliver'})

到目前为止,一切都很好。现在我正在尝试获取找到的命名实体的实际位置。在这种情况下,“奥利弗”。

查看文档,有:ent.start_char, ent.end_char可用,但如果我使用它:

"start_position": doc.start_char,
"end_position": doc.end_char

我收到以下错误:

AttributeError:“spacy.tokens.doc.Doc”对象没有属性“start_char”

有人可以指导我正确的方向吗?

标签: python-3.xnlpspacynamed-entity-recognition

解决方案


如果有人来到这里想要一个简单的问题答案,我相信以下应该做到这一点:

nlp = spacy.load('en_core_web_sm')
sentence = "Hi my name is Oliver!"
doc = nlp(sentence)

for ent in doc.ents:
    print(f"Entity {ent} found with start at {ent.start_char} and end at {ent.end_char}")

推荐阅读