首页 > 解决方案 > python文本模板匹配

问题描述

我必须将文本与正则表达式模板匹配。这是示例:

templates = {"x":"welcome {1}, how can I help you","y":"hi {1},here is your concern {2}"}
input_text = "welcome john, how can I help you" # for this output should be "x"
input_text = "Hi john,here is your concern sick leave"# for this output shoud be "y"
input_text = "Welcome john, how can I help you, how are you?" # for this output should be None
input_text = "can I know your name" # for this output should be None

您能否提供一些意见来解决这个问题?提前致谢。

标签: pythontextnlp

解决方案


您首先需要将模板转换为有效的正则表达式。在这里,我将{1}and替换{2}为 regex .+,它匹配任何非空的字符链。由于第三个示例还需要完全匹配而不是部分匹配,因此我添加$到正则表达式以强制它匹配文本直到结束。

regex_temp = re.sub(r'\{\s*\d+\s*\}','.+',template) + '$'

然后,您只需遍历模板并测试每个模板。该标志re.I使其不区分大小写,因为您的示例包含大写和小写文本。

另一种选择是直接将模板调整为有效的正则表达式,如下所示: templates = {"x":"welcome .+, how can I help you$","y":"hi .+,here is your concern .+"}然后您使用正则表达式中的模板并删除上面的字符串转换。

import regex as re
templates = {"x":"welcome {1}, how can I help you","y":"hi {1},here is your concern {2}"}

def find_template(input_text):
    for template_key, template in templates.items():
       regex_temp = re.sub(r'\{\s*\d+\s*\}','.+',template) + '$'
        if re.match(regex_temp, input_text, flags=re.I):
            return template_key
    return None

input_text = "welcome john, how can I help you" # for this output should be "x"
print(find_template(input_text))
input_text = "Hi john,here is your concern sick leave"# for this output shoud be "y"
print(find_template(input_text))
input_text = "Welcome john, how can I help you, how are you?" # for this output should be None
print(find_template(input_text))
input_text = "can I know your name" # for this output should be None
print(find_template(input_text))

>> x
>> y
>> None
>> None

推荐阅读