首页 > 解决方案 > Python 正则表达式:分别替换每个匹配项

问题描述

我想使用正则表达式来替换匹配的组合。

这就是我所拥有的:

>>> re.compile("0").sub("2", "01101")
'21121'

这就是我要的:

>>> replace_combinations(pattern="0", repl="2", string="01101")
['01101', '01121', '21101', '21121']

我可以使用re.finditer()单独获取所有匹配项,然后itertools.combinations()获取它们的组合,但我不知道如何进行替换部分。

标签: pythonpython-3.xregexcombinationsitertools

解决方案


经过一些实验后回答我自己的问题:对于简单的正则表达式,以下(有点复杂)方法有效。它可能不适用于更复杂的示例(反例,欢迎更好的答案)。

import re
from itertools import combinations

def replace_match(match, repl, string):
    pre = string[:match.start()]
    post = string[match.end():]
    replaced = re.sub(match.re, repl, string[match.start():match.end()])
    return pre + replaced + post


def replace_matches(matches, repl, string):
    # Reverse the matches so we match from the right side of the string first.
    # This means we don't adjust the indexing of later matches after replacing
    # earlier matches with non-constant lengths.
    for match in reversed(matches):
        string = replace_match(match, repl, string)
    return string


def replace_combinations(pattern, repl, string) -> Set[str]:
    from itertools import combinations
    results = set()
    matches = list(re.finditer(pattern, string))
    match_combinations = []
    for r in range(len(matches)+1):
        match_combinations.extend(combinations(matches, r))
    for match_combination in match_combinations:
        results.add(replace_matches(match_combination, repl, string))
    return results

replace_combinations(pattern="0", repl="2", string="01101")
# {'01121', '21121', '01101', '21101'}

推荐阅读