首页 > 解决方案 > 如何将python列表项与正则表达式匹配

问题描述

import re
def popular_words(text, words):
    """(str, array) -> dictionary
    returns dictionary  search words are the keys and values
    are the number of times when those words are occurring
    in a given text
    """
    word_dictionary = {}

    for word in words:     
        list = re.findall(word, text, re.IGNORECASE)
        word_dictionary.update({word : len(list) })

    return word_dictionary

popular_words('''
When I was One
I had just begun
When I was Two
I was nearly new
''', ['i', 'was', 'three', 'near']) 

如何忽略文本字符串中的“near”而不匹配“nearly”我尝试使用 \bword\b 来定义单词边界,错误是:

“行继续字符后的意外字符”

标签: pythonregex

解决方案


您绝对可以使用字符串格式和 \b。您遇到的错误可能是因为您没有使用这样的原始字符串(如果您使用反斜杠,请始终使用带有 re 的原始字符串,这会让生活更轻松。):

import re
def popular_words(text, words):
    """(str, array) -> dictionary
    returns dictionary  search words are the keys and values
    are the number of times when those words are occurring
    in a given text
    """
    word_dictionary = {}

    for word in words:

            list = re.findall(r'\b{0}\b'.format(word), text, re.IGNORECASE)
            word_dictionary.update({word : len(list) })

    return word_dictionary

print(popular_words('''
When I was One
I had just begun
When I was Two
I was nearly new
''', ['i', 'was', 'three', 'near']))

输出:

{'i': 4, 'near': 0, 'was': 3, 'three': 0}

编辑:为了完整起见。这是不使用原始字符串所必须使用的。你必须通过加倍来逃避反斜杠。

list = re.findall('\\b{0}\\b'.format(word), text, re.IGNORECASE)

推荐阅读