nlp - CoreNLP: Can it tell whether a noun refers to a person?
问题描述
Can CoreNLP determine whether a common noun (as opposed to a proper noun or proper name) refers to a person out-of-the-box? Or if I need to train a model for this task, how do I go about that?
First, I am not looking for coreference resolution, but rather a building block for it. Coreference by definition depends on the context, whereas I am trying to evaluate whether a word in isolation is a subset of "person" or "human". For example:
is_human('effort') # False
is_human('dog') # False
is_human('engineer') # True
My naive attempt to use Gensim's and spaCy's pre-trained word vectors failed to rank "engineer" above the other two words.
import gensim.downloader as api
word_vectors = api.load("glove-wiki-gigaword-100")
for word in ('effort', 'dog', 'engineer'):
print(word, word_vectors.similarity(word, 'person'))
# effort 0.42303842
# dog 0.46886832
# engineer 0.32456854
I found the following lists from CoreNLP promising.
dcoref.demonym // The path for a file that includes a list of demonyms
dcoref.animate // The list of animate/inanimate mentions (Ji and Lin, 2009)
dcoref.inanimate
dcoref.male // The list of male/neutral/female mentions (Bergsma and Lin, 2006)
dcoref.neutral // Neutral means a mention that is usually referred by 'it'
dcoref.female
dcoref.plural // The list of plural/singular mentions (Bergsma and Lin, 2006)
dcoref.singular
Would these work for my task? And if so, how would I access them from the Python wrapper? Thank you.
解决方案
我建议改用WordNet,看看:
- 如果 WordNet 涵盖了足够多的条款,并且
- 如果您想要的术语是
person.n.01
.
您必须稍微扩展一下以涵盖多种感官,但要点是:
from nltk.corpus import wordnet as wn
# True
wn.synset('person.n.01') in wn.synset('engineer.n.01').lowest_common_hypernyms(wn.synset('person.n.01'))
# False
wn.synset('person.n.01') in wn.synset('dog.n.01').lowest_common_hypernyms(wn.synset('person.n.01'))
请参阅 NLTK 文档lowest_common_hypernym
:http ://www.nltk.org/howto/wordnet_lch.html
推荐阅读
- python - 如何使 QDockWidget 透明/不透明?
- javascript - 无法在 Angular 10 的自定义第三方 Web 组件中使用 ngModel / *ngFor
- javascript - TYPEORM RepositoryNotFoundError:找不到“i”的存储库。看起来这个实体没有在当前的“默认”连接中注册?
- c# - NAudio 试图将 MP3 流加载到 Mp3FileReader 不会读取
- azure-active-directory - 如何将频道添加到 MS 团队中的自定义应用程序?
- python-3.x - 如何解决以下代码中的 EOF 错误?
- mongodb - 是否可以使用arrayToObject将objectsID数组转换为mongoDB中的对象?
- javascript - 从记录中只显示价格最高的那个
- can-bus - Autosar 中的 IPDM 是什么?如何为多个 IPDU 执行多路复用?
- java - Android Edittext 将 edittext 值放入数组和 Sharedpreferences