python - 使用 SpaCy 解析句子的更好方法?
问题描述
我正在使用 SpaCy 查找包含以代词作为主语的“is”或“was”的句子,并返回句子的宾语。我的代码有效,但我觉得必须有更好的方法来做到这一点。
import spacy
nlp = spacy.load('en_core_web_sm')
ex_phrase = nlp("He was a genius. I really liked working with him. He is a dog owner. She is very kind to animals.")
#create an empty list to hold any instance of this particular construction
list_of_responses = []
#split into sentences
for sent in ex_phrase.sents:
for token in sent:
#check to see if the word 'was' or 'is' is in each sentence, if so, make a list of the verb's constituents
if token.text == 'was' or token.text == 'is':
dependency = [child for child in token.children]
#if the first constituent is a pronoun, make sent_object equal to the item at index 1 in the list of constituents
if dependency[0].pos_ == 'PRON':
sent_object = dependency[1]
#create a string of the entire object of the verb. For instance, if sent_object = 'genius', this would create a string 'a genius'
for token in sent:
if token == sent_object:
whole_constituent = [t.text for t in token.subtree]
whole_constituent = " ".join(whole_constituent)
#check to see what the pronoun was, and depending on if it was 'he' or 'she', construct a coherent followup sentence
if dependency[0].text.lower() == 'he':
returning_phrase = f"Why do you think him being {whole_constituent} helped the two of you get along?"
elif dependency[0].text.lower() == 'she':
returning_phrase = f"Why do you think her being {whole_constituent} helped the two of you get along?"
#add each followup sentence to the list. For some reason it creates a lot of duplicates, so I have to use set
list_of_responses.append(returning_phrase)
list_of_responses = list(set(list_of_responses))
解决方案
您的代码似乎正在尝试做一些比您在问题中描述的更复杂的事情。我试图用你的代码做你想做的事情。获取动词“is”或“was”的宾语/属性只是其中的一部分。
import spacy
from pprint import pprint
nlp = spacy.load('en')
text = "He was a genius. I really liked working with him. He is a dog owner. She is very kind to animals."
def get_pro_nsubj(token):
# get the (lowercased) subject pronoun if there is one
return [child.lower_ for child in token.children if child.dep_ == 'nsubj'][0]
list_of_responses = []
# a mapping of subject to object pronouns
subj_obj_pro_map = {'he': 'him',
'she': 'her'
}
for token in nlp(text):
if token.pos_ in ['NOUN', 'ADJ']:
if token.dep_ in ['attr', 'acomp'] and token.head.lower_ in ['is', 'was']:
# to test for lemma 'be' use token.head.lemma_ == 'be'
nsubj = get_pro_nsubj(token.head)
if nsubj in ['he', 'she']:
# get the text of each token in the constituent and join it all together
whole_constituent = ' '.join([t.text for t in token.subtree])
obj_pro = subj_obj_pro_map[nsubj] # convert subject to object pronoun
returning_phrase = 'Why do you think {} being {} helped the two of you get along?'.format(obj_pro, whole_constituent)
list_of_responses.append(returning_phrase)
pprint(list_of_responses)
哪个输出:
['Why do you think him being a genius helped the two of you get along?',
'Why do you think him being a dog owner helped the two of you get along?',
'Why do you think her being very kind to animals helped the two of you get '
'along?']
推荐阅读
- python - 多维numpy数组变成可变数量的几个一维数组用于函数scipy函数参数
- javascript - VueJs SPA 在单击按钮上执行功能
- c# - 有没有办法像 Visual Studio 那样“安装”nuget 包,但是在部署服务器上
- python - Django如何将JsonResponse函数调用到其他函数
- python - 样式表元素适用于 Windows,但不适用于 raspberry-pi
- typescript - 我可以让 lodash 省略返回特定类型而不是部分类型吗?
- python - 如何为 tkinter 按钮分配功能?
- javascript - 如何使用 create-react-app 项目访问文件系统(如 react-native-fs 模块)?
- go - 比较 Go Win32 系统调用和 cgo:它的开销是否相同?
- javascript - 在打字稿中拆分嵌套数组