首页 > 解决方案 > 将字符串转换为整数数组时仅在捕获组中放置数字

问题描述

烘焙地

所以这个问题的灵感来自 codereview 上的以下问题: Converting a string to an array of integers。打开如下:

我正在处理可以采用以下格式之一的字符串 draw_result:

"03-23-27-34-37, Mega Ball: 13" 
"01-12 + 08-20" 
"04-15-17-25-41"

我总是从 draw_result 开始,其中值是上述值之一。我想去:

[3, 23, 27, 34, 37] 
[1, 12, 8, 20]
[4, 15, 17, 25, 41]

这个问题可以用多个正则表达式来解决,如下所示

import re
from typing import Iterable

lottery_searches = [
    re.compile(pat).match
    for pat in (
        r'^(\d\d)-(\d\d)-(\d\d)-(\d\d)-(\d\d), Mega Ball.*$',
        r'^(\d\d)-(\d\d) \+ (\d\d)-(\d\d)$',
        r'^(\d\d)-(\d\d)-(\d\d)-(\d\d)-(\d+)$',
    )
]


def lottery_string_to_ints(lottery: str) -> Iterable[int]:
    for search in lottery_searches:
        if match := search(lottery):
            return (int(g) for g in match.groups())

    raise ValueError(f'"{lottery}" is not a valid lottery string')

问题

尝试解决

正则表达式

PATTERN = re.compile(
    r"""
         (?P<digit0>\d\d)                   # Matches a double digit [00..99] and names it digit0
         (?P<sep>-)                         # Matches any one digit character - saves it as sep
         (?P<digit1>\d\d)                   # Matches a double digit [00..99] and names it digit1
         (\s+\+\s+|(?P=sep))                # Matches SPACE + SPACE OR the seperator saved in sep (-)
         (?P<digit2>\d\d)                   # Matches a double digit [00..99] and names it digit2
         (?P=sep)                           # Matches any one digit character - saves it as sep
         (?P<digit3>\d\d)                   # Matches a double digit [00..99] and names it digit3
         ((?P=sep)(?P<digit4>\d\d))?        # Checks if there is a final fifth digit (-01), saves to digit5
        """,
    re.VERBOSE,
)

恢复

def extract_numbers_narrow(draw_result, digits=5):
    numbers = []
    if match := re.match(PATTERN2, draw_result):
        for i in range(digits):
            ith_digit = f"digit{i}"
            try:
                number = int(match.group(ith_digit))
            except IndexError:  # Catches if the group does not exists
                continue
            except TypeError:  # Catches if the group is None
                continue
            numbers.append(number)
    return numbers

标签: pythonpython-3.xregexregex-groupre

解决方案


看来您想在逗号前获取所有数字。您可以使用这个基于 PyPiregex的解决方案

import regex

texts = ['03-23-27-34-37, Mega Ball: 13', '01-12 + 08-20', '04-15-17-25-41']
reg = regex.compile(r'^(?:[^\w,]*(\d+))+')

for text in texts:
    match = reg.search(text)
    if match:
        print( text, '=>', list(map(int,match.captures(1))) )

请参阅在线 Python 演示

^(?:[^\w,]*(\d+))+则表达式匹配任何零个或多个字符的一个或多个序列,而不是单词和逗号字符,后跟一个或多个数字(捕获到第 1 组)在字符串的开头。由于regex为每个捕获组保留一个堆栈,因此您可以使用 访问所有捕获的数字.captures()

如果你需要使用内置的re,你可以使用

import re
 
texts = ['03-23-27-34-37, Mega Ball: 13', '01-12 + 08-20', '04-15-17-25-41']
reg = re.compile(r'^(?:[^\w,]*\d+)+')
 
for text in texts:
    match = reg.search(text)
    if match:
        print( text, '=>', list(map(int,re.findall(r'\d+', match.group()))) )

请参阅此 Python 演示,其中re.findall(r'\d+'...)从匹配值中提取数字。

两个输出:

03-23-27-34-37, Mega Ball: 13 => [3, 23, 27, 34, 37]
01-12 + 08-20 => [1, 12, 8, 20]
04-15-17-25-41 => [4, 15, 17, 25, 41]

推荐阅读