python - 如果使用 python spaCy PhraseMatcher 从两个模式中的每一个中找到一个匹配项,则返回匹配项
问题描述
我有多个文本片段,存储在一个列表中,可以说如下所示:
text = ['mary had a little lamb', 'julie had a little goat',
'julie enjoys eating pizza', 'mary went to the market',
'in the market there was a lamb', 'my goat likes to drink coffee',
'tara throws a ball for her goat', 'a goat and a kangaroo can often be friends',
'tara and mary like to drink beer']
仅当文本片段同时包含动物名称和女孩名称时,我才想返回匹配项。因此,对于上面的文本,我希望它只返回这些片段:
['mary had a little lamb', 'julie had a little goat',
'tara throws a ball for her goat']
我觉得我应该能够spaCy
通过定义多个这样的模式来做到这一点:
nlp = spacy.load("en_core_web_sm")
matcher = spacy.matcher.PhraseMatcher(nlp.vocab)
girls_names = ['mary', 'tara', 'julie']
animals = ['lamb', 'goat']
phrase_matcher.add('GIRLS_NAMES', None, *girls_names)
phrase_matcher.add('ANIMALS', None, *animals)
我已经做了spaCy
一些工作来匹配关键字(下面的代码),但我不知道如何在每个模式中的一个单词匹配时让它标记,甚至不知道如何让它打印正在匹配的模式。
for fragment in text:
doc = nlp(fragment)
matches = phrase_matcher(doc)
print('MATCHED KEYWORDS:')
for match_id, start, end in matches:
span = doc[start:end]
print(span.text)
print ('FRAGMENT')
print(fragment)
输出:
MATCHED KEYWORDS:
mary
lamb
FRAGMENT
mary had a little lamb
MATCHED KEYWORDS:
julie
goat
FRAGMENT
julie had a little goat
MATCHED KEYWORDS:
julie
FRAGMENT
julie enjoys eating pizza
MATCHED KEYWORDS:
mary
FRAGMENT
mary went to the market
MATCHED KEYWORDS:
lamb
FRAGMENT
in the market there was a lamb
MATCHED KEYWORDS:
goat
FRAGMENT
my goat likes to drink coffee
MATCHED KEYWORDS:
tara
goat
FRAGMENT
tara throws a ball for her goat
MATCHED KEYWORDS:
goat
kangaroo
FRAGMENT
a goat and a kangaroo can often be friends
MATCHED KEYWORDS:
tara
mary
FRAGMENT
tara and mary like to drink beer
解决方案
使用match_id
匹配短语中的 GIRLS_NAMES 和 ANIMALS。
import spacy
from spacy.matcher import PhraseMatcher
nlp = spacy.load("en_core_web_sm")
phrase_matcher = PhraseMatcher(nlp.vocab)
girls_names = [nlp.make_doc(text) for text in ['mary', 'tara', 'julie']]
animals = [nlp.make_doc(text) for text in ['lamb', 'goat']]
phrase_matcher.add('GIRLS_NAMES', None, *girls_names)
phrase_matcher.add('ANIMALS', None, *animals)
text = ['mary had a little lamb', 'julie had a little goat',
'julie enjoys eating pizza', 'mary went to the market',
'in the market there was a lamb', 'my goat likes to drink coffee',
'tara throws a ball for her goat', 'a goat and a kangaroo can often be friends',
'tara and mary like to drink beer']
for fragment in text:
doc = nlp(fragment)
matches = phrase_matcher(doc)
rule_ids = {nlp.vocab.strings[match[0]] for match in matches}
if {'GIRLS_NAMES', 'ANIMALS'}.issubset(rule_ids):
print(fragment)
输出:
mary had a little lamb
julie had a little goat
tara throws a ball for her goat
推荐阅读
- python - osx 上的 tempfile.mkdtemp() 区别?
- python - Python scrapy 从表格列表中获取详细信息
- jquery - Nothing is firing on SELECT change
- excel - 使用 Excel VBA 将表格数据插入列表框
- python - 编解码器无法编码字符python3
- c - 无法正确打印快速排序排列的矩阵
- android - 键盘里面的安卓键盘
- r - find max column value in r conditional on another column
- arduino - 将对象写入和读取到 esp32 闪存,arduino
- c# - 一旦我将脚本应用到场景编辑器中的游戏对象,我的函数就会运行。我需要它只在运行时影响对象