首页 > 解决方案 > 如何自动进行词性标注和词形还原

问题描述

我们可以在python中使用wordnet。

假设我们给出文本:

"International companies had interns"

然后我们可以手动进行词性标注:

tokens = ["International","companies","had","interns"]
word_type = {"International":wordnet.ADJ,"companies":wordnet.NOUN,"had":wordnet.VERB,"interns":wordnet.NOUN}

lemmatizer=WordNetLemmatizer()
token_list=[]
for token in tokens:
    token_list.append(lemmatizer.lemmatize(token,word_type[token]))

我想避免代码的手动部分并自动执行:

word_type = {"International":wordnet.ADJ,"companies":wordnet.NOUN,"had":wordnet.VERB,"interns":wordnet.NOUN}
#need to do above automatically from given text

标签: pythonwordnetlemmatizationpart-of-speech

解决方案


推荐阅读