python - 如何根据每个句子而不是通过文件匹配命名实体
问题描述
我有一个文本文件,我实现了 Polyglot NER 以从该文本文件中提取实体。然后我必须分割每个句子并匹配每个句子上提取的实体。匹配时,它应该给我输出。
from polyglot.text import Text
file = open('input_raw.txt', 'r')
input_file = file.read()
file = Text(input_file, hint_language_code='fa')
def return_match(entities_list, sentence): ## Check if Chunks
for term in entities_list: ## are in any of the entities
## Check each list in each Chunk object
## and see if there's any matches.
for entity in sentence.entities:
if entity == term:
return entity
return None
def return_list_of_entities(file):
list_entity = []
for sentence in file.sentences:
for entity in sentence.entities:
list_entity.append(entity)
return list_entity
list_entity = return_list_of_entities(file)
#sentence_number = 4 # Which sentence to check
for sentence in range(len(file.sentences)):
sentencess = file.sentences[sentence]
match = return_match(list_entity, sentencess)
if match is not None:
print("Entity Term " + str(match) +
" is in the sentence. '" + str(sentencess)+ "'")
else:
print("Sentence '" + str(sentencess) +
"' doesn't contain any of the terms" + str(list_entity))
输入文件:
Bill Gates is the founder of Microsoft.
Trump is the president of the USA.
Bill Gates was a student in Harvard.
当我们实现 NER 时,实体看起来像:
list_etity:
Bill Gates, Microsoft, Trump, USA, Bill Gate, Harvard
当我们将实体与第一句话匹配时,它给出:
电流输出:
(Bill Gates, Bill Gates, Microsoft)
预期输出:
(Bill Gates, Microsoft) # this is from the first sentence and should contine
(Trump, USA)
(Bill Gates, Harvard)
解决方案
from polyglot.text import Text
import json
file = open('input_raw.txt', 'r')
input_file = file.read()
file = Text(input_file, hint_language_code='fa')
result = set()
entities_with_tag = []
def return_match(entities_list, sentence): # Check if Chunks
for i in range(len(sentence.entities)):
for j in range(len(entities_list)):
if entities_list[j] == sentence.entities[i]:
# result.append(sentence.entities[i])
result.add(str(sentence.entities[i]))
entities_with_tag.append(sentence.entities[i])
def return_list_of_entities(file):
list_entity = []
for sentence in file.sentences:
for entity in sentence.entities:
list_entity.append(entity)
return list_entity
list_entity = return_list_of_entities(file)
def return_sentence_number():
for i in range(len(file.sentences)):
sentence_no = file.sentences[i]
return sentence_no
sent_no = return_sentence_number()
return_match(list_entity, sent_no)
print("Entity Term " + str(result) + " is in the sentence. '" + str(sent_no) + "'")
推荐阅读
- selenium - 如何使用 appium 在浏览器堆栈中执行期间更新应用程序?
- python - 理解 tweepy 的状态对象
- javascript - Vuetify - 垂直显示标题和 v-card 以及全尺寸
- masstransit - MassTransit IJobConsumer 无法运行作业
- rust - 如何在 rust 中实现 id 锁定?
- php - 无效签名 - 提供的签名不匹配 - 状态 401
- php - 这种方法足以保护我的主机(php)吗?
- javascript - 在运行时替换我网站特定页面中所有出现的字符串
- ios - 为什么 Cordova ios Vue 组件在迁移到 WKWebview 后仅在重新加载应用程序时显示?
- javascript - 使用 Mozilla PDFJS 如何显示所有页面而不是单个页面