首页 > 解决方案 > 如果使用 python spaCy PhraseMatcher 从两个模式中的每一个中找到一个匹配项,则返回匹配项

问题描述

我有多个文本片段,存储在一个列表中,可以说如下所示:

text = ['mary had a little lamb', 'julie had a little goat',
'julie enjoys eating pizza', 'mary went to the market', 
'in the market there was a lamb', 'my goat likes to drink coffee', 
'tara throws a ball for her goat', 'a goat and a kangaroo can often be friends',
'tara and mary like to drink beer']

仅当文本片段同时包含动物名称和女孩名称时,我才想返回匹配项。因此,对于上面的文本,我希望它只返回这些片段:

['mary had a little lamb', 'julie had a little goat',
'tara throws a ball for her goat']

我觉得我应该能够spaCy通过定义多个这样的模式来做到这一点:

nlp = spacy.load("en_core_web_sm")
matcher = spacy.matcher.PhraseMatcher(nlp.vocab)

girls_names = ['mary', 'tara', 'julie']
animals = ['lamb', 'goat']

phrase_matcher.add('GIRLS_NAMES', None, *girls_names)
phrase_matcher.add('ANIMALS', None, *animals)

我已经做了spaCy一些工作来匹配关键字(下面的代码),但我不知道如何在每个模式中的一个单词匹配时让它标记,甚至不知道如何让它打印正在匹配的模式。

for fragment in text:
doc = nlp(fragment)
matches = phrase_matcher(doc)
print('MATCHED KEYWORDS:')
for match_id, start, end in matches:
    span = doc[start:end]
    print(span.text)
print ('FRAGMENT')
print(fragment)

输出:

MATCHED KEYWORDS:
mary
lamb
FRAGMENT
mary had a little lamb
MATCHED KEYWORDS:
julie
goat
FRAGMENT
julie had a little goat
MATCHED KEYWORDS:
julie
FRAGMENT
julie enjoys eating pizza
MATCHED KEYWORDS:
mary
FRAGMENT
mary went to the market
MATCHED KEYWORDS:
lamb
FRAGMENT
in the market there was a lamb
MATCHED KEYWORDS:
goat
FRAGMENT
my goat likes to drink coffee
MATCHED KEYWORDS:
tara
goat
FRAGMENT
tara throws a ball for her goat
MATCHED KEYWORDS:
goat
kangaroo
FRAGMENT
a goat and a kangaroo can often be friends
MATCHED KEYWORDS:
tara
mary
FRAGMENT
tara and mary like to drink beer

标签: pythonspacy

解决方案


使用match_id匹配短语中的 GIRLS_NAMES 和 ANIMALS。

import spacy
from spacy.matcher import PhraseMatcher

nlp = spacy.load("en_core_web_sm")
phrase_matcher = PhraseMatcher(nlp.vocab)

girls_names = [nlp.make_doc(text) for text in ['mary', 'tara', 'julie']]
animals = [nlp.make_doc(text) for text in ['lamb', 'goat']]

phrase_matcher.add('GIRLS_NAMES', None, *girls_names)
phrase_matcher.add('ANIMALS', None, *animals)

text = ['mary had a little lamb', 'julie had a little goat',
'julie enjoys eating pizza', 'mary went to the market',
'in the market there was a lamb', 'my goat likes to drink coffee',
'tara throws a ball for her goat', 'a goat and a kangaroo can often be friends',
'tara and mary like to drink beer']

for fragment in text:
    doc = nlp(fragment)
    matches = phrase_matcher(doc)
    rule_ids = {nlp.vocab.strings[match[0]] for match in matches}
    if {'GIRLS_NAMES', 'ANIMALS'}.issubset(rule_ids):
        print(fragment)

输出:

mary had a little lamb
julie had a little goat
tara throws a ball for her goat

推荐阅读