python - Python 错误的命名实体识别
问题描述
所以我试图编写一个文本预处理器并试图让 nltk.ne_chunk() 工作但我得到了以下代码的许多错误
z = "Francois Legault of the CAQ will now become the new premier of Quebec. This is possible as his party defeated the Liberals in the Provincial elections held on October 1st 2018."
def preprocess_pipe1(doc1):
sent1 = nltk.sent_tokenize(doc1)
#print(sent1)
print(" ")
print ("SENTENCE SPLITTER")
for x in sent1:
print(x)
print(" ")
sent1 = [nltk.word_tokenize(sent2) for sent2 in sent1]
#print(sent1)
print(" ")
print ("TOKENIZER")
for x in sent1:
print(x)
print(" ")
sent1 = [nltk.pos_tag(sent2) for sent2 in sent1]
#print(sent1)
print(" ")
print ("POS TAGGER")
for x in sent1:
print(x)
return(sent1)
sent2=preprocess_pipe1(z)
sent3=nltk.ne_chunk(sent2)
print(sent3)
` 错误如下
CAQ 的 SENTENCE SPLITTER Francois Legault 现在将成为魁北克的新总理。这是可能的,因为他的政党在 2018 年 10 月 1 日举行的省级选举中击败了自由党。
代币化器
['Francois', 'Legault', 'of', 'the', 'CAQ', 'will', 'now', 'become', 'the', 'new', 'premier', 'of', 'Quebec', '.']
['This', 'is', 'possible', 'as', 'his', 'party', 'defeated', 'the', 'Liberals', 'in', 'the', 'Provincial', 'elections', 'held', 'on', 'October', '1st', '2018', '.']
POS TAGGER
[('Francois', 'NNP'), ('Legault', 'NNP'), ('of', 'IN'), ('the', 'DT'), ('CAQ', 'NNP'), ('will', 'MD'), ('now', 'RB'), ('become', 'VB'), ('the', 'DT'), ('new', 'JJ'), ('premier', 'NN'), ('of', 'IN'), ('Quebec', 'NNP'), ('.', '.')]
[('This', 'DT'), ('is', 'VBZ'), ('possible', 'JJ'), ('as', 'IN'), ('his', 'PRP$'), ('party', 'NN'), ('defeated', 'VBD'), ('the', 'DT'), ('Liberals', 'NNS'), ('in', 'IN'), ('the', 'DT'), ('Provincial', 'NNP'), ('elections', 'NNS'), ('held', 'VBD'), ('on', 'IN'), ('October', 'NNP'), ('1st', 'CD'), ('2018', 'CD'), ('.', '.')]
错误:
回溯(最近一次通话最后):文件“C:/Users/Robin Karlose/PycharmProjects/NLTK Test 1/Code 5 - NER test.py”,第 71 行,在 sent3=nltk.ne_chunk(sent2) 文件“C:\ Users\Robin Karlose\PycharmProjects\NLTK Test 1\venv\lib\site-packages\nltk\chunk__init__.py",第 177 行,在 ne_chunk 返回 chunker.parse(tagged_tokens) 文件“C:\Users\Robin Karlose\PycharmProjects\ NLTK 测试 1\venv\lib\site-packages\nltk\chunk\named_entity.py",第 123 行,解析中标记 = self._tagger.tag(tokens) 文件“C:\Users\Robin Karlose\PycharmProjects\NLTK 测试1\venv\lib\site-packages\nltk\tag\sequential.py”,第 63 行,在标签 tags.append(self.tag_one(tokens, i, tags)) 文件“C:\Users\Robin Karlose\PycharmProjects \NLTK 测试 1\venv\lib\site-packages\nltk\tag\sequential.py",第 83 行,在 tag_one tag = tagger.choose_tag(tokens, index, history) 文件“C:\Users\Robin Karlose\PycharmProjects\NLTK Test 1\venv\lib\site-packages\nltk\tag\sequential.py”,第 632 行,在 choose_tag 特征集 = self.feature_detector(tokens, index, history) 文件“C:\Users\Robin Karlose\PycharmProjects\NLTK Test 1\venv\lib\site-packages\nltk\tag\sequential.py”,第 680 行,在 feature_detector return self._feature_detector(tokens, index, history) 文件“C:\Users\Robin Karlose\PycharmProjects\NLTK Test 1\venv\lib\site-packages\nltk\chunk\named_entity.py”,第 56 行,在_feature_detector pos = simple_pos(tokens[index][1]) 文件“C:\Users\Robin Karlose\PycharmProjects\NLTK Test 1\venv\lib\site-packages\nltk\chunk\named_entity.py”,第 186 行,在如果是simplify_pos。startswith('V'): return "V" AttributeError: 'tuple' object has no attribute 'startswith'
有趣的是,当我运行此代码时,NER 工作得很好
import nltk
import nltk.corpus
sent = nltk.corpus.treebank.tagged_sents()[22]
print(sent)
print(nltk.ne_chunk(sent))
据我了解 - 在这两种情况下,我都将 POS 标记文本发送到 NLTK 命名实体识别函数(即 nltk.ne_chunk() ),但对于我的一生,我无法理解为什么在第一种情况下会有这么多错误。
如果有人能对此事提供一些见解,我将不胜感激!
解决方案
推荐阅读
- c# - Binance.Net 代码从不更新
- latex - 为什么标题后面有这个空白?我怎样才能删除它?
- c - 结构字段的内存分配错误
- python - 如何在 Python 中使用我自己的核函数进行贝叶斯优化
- r - 使用 dplyr 在不同的标准上进行多次变异
- android - Canvas.drawText:EmojiCompat 问题(Android - Java)
- html - AccountsController#profile 中的 ActionView::SyntaxErrorInTemplate
- php - Symfony finder 中文件元素的默认顺序是什么?
- angular - Angular:如何在不需要开启 SSR 的情况下注入一些静态的预渲染页面
- asp.net - 多 AOS 环境中的 AX 2009 Business Connector LogonSystemChangedException