首页 > 解决方案 > 提取单词/短语后跟一个短语

问题描述

我有一个包含短语列表的文本文件。以下是文件的外观:

文件名:KP.txt

在此处输入图像描述

从下面的输入(段落)中,我想提取短语后面的下两个单词KP.txt(短语可以是我上面KP.txt文件中显示的任何内容)。我只需要提取接下来的 2 个单词。

输入:

This is Lee. Thanks for contacting me. I wanted to know the exchange policy at Noriaqer hardware services.

在上面的示例中,我发现短语" I wanted to know", 与KP.txt文件内容匹配。因此,如果我想在此之后提取接下来的 2 个单词,我的输出将是"exchange policy".

我怎么能在python中提取这个?

标签: pythonextractphrase

解决方案


我认为自然语言处理可能是一个更好的解决方案,但这段代码会有所帮助:)

def search_in_text(kp,text):
    for line in kp:
        #if a search phrase found in kp lines
        if line in text:
            #the starting index of the two words
            i1=text.find(line)+len(line)
            #the end index of the following two words (first index+50 at maximum)
            i2=(i1+50) if len(text)>(i1+50) else len(text)
            #split the following text to words (next_words) and remove empty spaces
            next_words=[word for word in text[i1:i2].split(' ') if word!='']
            #return  only the next two words from (next_words)
            return next_words[0:2]        
    return [] # return empty list if no phrase matching
        
#read your kp file as list of lines
kp=open("kp.txt").read().split("\n")
#input 1 
text = 'This is Lee. Thanks for contacting me. I wanted to know exchange policy at Noriaqer hardware services.'
print('input ->>',text)
output = search_in_text(kp,text)
print('output ->>',output)
input ->> This is Lee. Thanks for contacting me. I wanted to know exchange policy at Noriaqer hardware services.
output ->> ['exchange', 'policy']
#input 2
text = 'Boss was very angry and said: I wish to know why you are late?'
print('input ->>',text)
output = search_in_text(kp,text)
print('output ->>',output)
input ->> Boss was very angry and said: I wish to know why you are late?
output ->> ['why', 'you']

推荐阅读