首页 > 解决方案 > 如何提取除冠词和所有格之外的名词?

问题描述

背景

我想知道如何区分名词及其修饰,例如冠词和所有格。

例子

#sentence
The man with the star regarded her with his expressionless eyes.


# what to extract 
man
star
eyes

问题

如下图使用 disPlacy 工具创建的,“男人、星星和他没有表情的眼睛”被统一为NOUN

词性和依赖关系的可视化工具

https://explosion.ai/demos/displacy

例句的结果

我试过的

我已经运行了spaCy 页面上介绍的示例代码。

import spacy

nlp = spacy.load("en_core_web_sm")
doc = nlp("The man with the star regarded her with his calm, expressionless eyes.")
for token in doc:
    print(token.text, token.dep_, token.head.text, token.head.pos_,
            [child for child in token.children])

使用以下结果或其他方式,如何提取名词本身,不包括它们的冠词和所有格?

$ python sample.py
The det man NOUN []
man nsubj regarded VERB [The, with]
with prep man NOUN [star]
the det star NOUN []
star pobj with ADP [the]
regarded ROOT regarded VERB [man, her, with, .]
her dobj regarded VERB []
with prep regarded VERB [eyes]
his poss eyes NOUN []
calm amod eyes NOUN [,]
, punct calm ADJ []
expressionless amod eyes NOUN []
eyes pobj with ADP [his, calm, expressionless]
. punct regarded VERB []

标签: pythonpython-3.xnlpspacy

解决方案


尝试这样的事情来实现您在初始示例中提供的所需输出:

import spacy

nlp = spacy.load('en')

text = "The man with the star regarded her with his expressionless eyes."

for word in nlp(text):
  if word.pos_ == 'NOUN':
    print(word.text)

输出:

man
star
eyes

您也可以考虑使用nltk包,因为它可能会更快,对于这个用例:

import nltk

text = "The man with the star regarded her with his expressionless eyes."

for word, pos in nltk.pos_tag(nltk.word_tokenize(text)):
  if pos[0] == 'N':
    print(word)

输出:

man
star
eyes

推荐阅读