python - 提取单词/短语后跟一个短语
问题描述
我有一个包含短语列表的文本文件。以下是文件的外观:
文件名:KP.txt
从下面的输入(段落)中,我想提取短语后面的下两个单词KP.txt
(短语可以是我上面KP.txt
文件中显示的任何内容)。我只需要提取接下来的 2 个单词。
输入:
This is Lee. Thanks for contacting me. I wanted to know the exchange policy at Noriaqer hardware services.
在上面的示例中,我发现短语" I wanted to know"
, 与KP.txt
文件内容匹配。因此,如果我想在此之后提取接下来的 2 个单词,我的输出将是"exchange policy"
.
我怎么能在python中提取这个?
解决方案
我认为自然语言处理可能是一个更好的解决方案,但这段代码会有所帮助:)
def search_in_text(kp,text):
for line in kp:
#if a search phrase found in kp lines
if line in text:
#the starting index of the two words
i1=text.find(line)+len(line)
#the end index of the following two words (first index+50 at maximum)
i2=(i1+50) if len(text)>(i1+50) else len(text)
#split the following text to words (next_words) and remove empty spaces
next_words=[word for word in text[i1:i2].split(' ') if word!='']
#return only the next two words from (next_words)
return next_words[0:2]
return [] # return empty list if no phrase matching
#read your kp file as list of lines
kp=open("kp.txt").read().split("\n")
#input 1
text = 'This is Lee. Thanks for contacting me. I wanted to know exchange policy at Noriaqer hardware services.'
print('input ->>',text)
output = search_in_text(kp,text)
print('output ->>',output)
input ->> This is Lee. Thanks for contacting me. I wanted to know exchange policy at Noriaqer hardware services.
output ->> ['exchange', 'policy']
#input 2
text = 'Boss was very angry and said: I wish to know why you are late?'
print('input ->>',text)
output = search_in_text(kp,text)
print('output ->>',output)
input ->> Boss was very angry and said: I wish to know why you are late?
output ->> ['why', 'you']
推荐阅读
- c++ - 为什么输出不同而公式相同?
- c# - 如何在 C# 中单击元素
- sql-server - 在 SQL 中将总和作为列重复
- excel - 根据条件计算平均值
- python - odoo 12 中树视图上的计算字段
- html - 带有可滚动内容的固定页眉和页脚+不可点击的超链接
- powershell - Powershell命令Register-PSRepository以错误结束
- javascript - 禁用从全局错误函数调用本地错误 - JQuery
- python - 尽管有 100% 的覆盖率报告,如何找到在 coverage.py 中从未执行过的代码
- c# - 获取 LinkButton 嵌入式标签值