首页 > 解决方案 > NLTK 命名实体识别不起作用

问题描述

我创建了一个名为result的数据框。每个单元格都包含带有名称的长文本。我需要识别名称。我使用了以下代码:

import nltk

for col in result:
    for i in range(result.shape[0]):
       print(result[col][i])
       if not result[col][i]:
           continue
       else:
           entities = nltk.ne_chunk(result[col][i])
           for entity in entities:
               print(entity)
               if type(entity) == nltk.tree.Tree: 
                   print(entity)

我得到的错误是

ERROR:root:message
Traceback (most recent call last):
  File "<ipython-input-99-8b2514840dbb>", line 10, in <module>
   entities = nltk.ne_chunk(result[col][1])
  File "C:\Users\II00083764\AppData\Local\Continuum\anaconda3\lib\site- 
    packages\nltk\chunk\__init__.py", line 186, in ne_chunk
       return chunker.parse(tagged_tokens)
  File "C:\Users\II00083764\AppData\Local\Continuum\anaconda3\lib\site- 
    packages\nltk\chunk\named_entity.py", line 128, in parse
       tagged = self._tagger.tag(tokens)
  File "C:\Users\II00083764\AppData\Local\Continuum\anaconda3\lib\site- 
     packages\nltk\tag\sequential.py", line 64, in tag
       tags.append(self.tag_one(tokens, i, tags))
  File "C:\Users\II00083764\AppData\Local\Continuum\anaconda3\lib\site- 
     packages\nltk\tag\sequential.py", line 84, in tag_one
       tag = tagger.choose_tag(tokens, index, history)
  File "C:\Users\II00083764\AppData\Local\Continuum\anaconda3\lib\site- 
     packages\nltk\tag\sequential.py", line 652, in choose_tag
       featureset = self.feature_detector(tokens, index, history)
  File "C:\Users\II00083764\AppData\Local\Continuum\anaconda3\lib\site- 
     packages\nltk\tag\sequential.py", line 699, in feature_detector
       return self._feature_detector(tokens, index, history)
  File "C:\Users\II00083764\AppData\Local\Continuum\anaconda3\lib\site- 
     packages\nltk\chunk\named_entity.py", line 59, in _feature_detector
       pos = simplify_pos(tokens[index][1])

IndexError: string index out of range

有人能帮我吗

标签: pythonpython-3.xpython-2.7

解决方案


推荐阅读