python - 将 NLP WordNetLemmatizer 应用于整个句子显示错误且位置未知
问题描述
我想在整个句子上应用 NLP WordNetLemmatizer。问题是我得到一个错误:
KeyError: 'NNP'
就像我得到未知的'pos'值,但我不知道为什么。我想获得单词的基本形式,但没有'pos'它不起作用。你能告诉我我做错了什么吗?
import nltk
from nltk.tokenize import PunktSentenceTokenizer
from nltk.tokenize import word_tokenize
from nltk.tokenize import RegexpTokenizer
from nltk.stem import WordNetLemmatizer
nltk.download('averaged_perceptron_tagger')
sentence = "I want to find the best way to lemmantize this sentence so that I can see better results of it"
taged_words = nltk.pos_tag(sentence)
print(taged_words)
lemmantised_sentence = []
lemmatizer = WordNetLemmatizer()
for word in taged_words:
filtered_text_lemmantised = lemmatizer.lemmatize(word[0], pos=word[1])
print(filtered_text_lemmantised)
lemmantised_sentence.append(filtered_text_lemmantised)
lemmantised_sentence = ' '.join(lemmantised_sentence)
print(lemmantised_sentence)
解决方案
句子在发送到 pos_tag 函数之前应该被拆分。此外, pos 参数的不同之处在于它接受的字符串类型。它只接受'N','V'等。我已从此https://stackoverflow.com/a/15590384/7349991更新了您的代码。
import nltk
from nltk.tokenize import PunktSentenceTokenizer
from nltk.tokenize import word_tokenize
from nltk.tokenize import RegexpTokenizer
from nltk.stem import WordNetLemmatizer
from nltk.corpus import wordnet
def main():
nltk.download('averaged_perceptron_tagger')
nltk.download('wordnet')
sentence = "I want to find the best way to lemmantize this sentence so that I can see better results of it"
taged_words = nltk.pos_tag(sentence.split())
print(taged_words)
lemmantised_sentence = []
lemmatizer = WordNetLemmatizer()
for word in taged_words:
if word[1]=='':
continue
filtered_text_lemmantised = lemmatizer.lemmatize(word[0], pos=get_wordnet_pos(word[1]))
print(filtered_text_lemmantised)
lemmantised_sentence.append(filtered_text_lemmantised)
lemmantised_sentence = ' '.join(lemmantised_sentence)
print(lemmantised_sentence)
def get_wordnet_pos(treebank_tag):
if treebank_tag.startswith('J'):
return wordnet.ADJ
elif treebank_tag.startswith('V'):
return wordnet.VERB
elif treebank_tag.startswith('N'):
return wordnet.NOUN
else:
return wordnet.ADV
if __name__ == '__main__':
main()
推荐阅读
- c++ - 如何使用可变参数模板在 C++ 中概括对象创建?
- c++ - C++ 嵌套命名空间错误 - 预期类型说明符错误
- c# - 有没有办法格式化嵌套的 html 列表
- 到一个可读的字符串
- javascript - Prime-NG 确认对话框在服务中不起作用
- android - 为什么我的 MediaPlayer 音频失真/削波?
- java - 在这种情况下是否会在同一个 JVM 实例中加载同一个类两次?
- angular - 带有 animateTransform 的 SVG 动画在 Angular 中不起作用
- makefile - NMAKE子目录中文件的通配符
- c++ - 规范化包含编译时 * 或 * 运行时值的对象的构造
- javascript - 如何在 package.json 的项目目录中为 gulp 文件创建路径