python - 如何在 Python 中修复我的重新编译语句

问题描述

我有一个文本文件，我正在使用 re 定位文本的特定部分（包含不同城镇用水量的列表）并将信息放入 pandas 数据框中。文本列表使用字母排序，例如 (a)、(b)、(c) 等。代码工作正常，并将我需要的所有信息返回到数据框中，直到排序切换为双字母，例如 (aa)、(ab ), (ac) 等

如何修复我的 re 语句，使其也适用于文本列表中的双字母索引？

这是代码：

pattern = regex.compile('\d+ (?=ML\/year)|(?<= in the |the )[\w \/\(\)]+')
    columns = ('Water Usage', 'Town')

    res = [dict(zip(columns, pattern.findall(line))) for line in finalText.splitlines() if pattern.match(line)]
    df = pd.DataFrame(res)

    return df

这是文本的示例：

(w) 218 ML/year in the Murrumbidgee I Water Source,
(x) 133 ML/year in the Murrumbidgee II Water Source,
(y) 116 ML/year in the Murrumbidgee III Water Source,
(z) 73 ML/year in the Murrumbidgee North Water Source,
(aa) 476 ML/year in the Murrumbidgee Western Water Source,
(ab) 92 ML/year in the Muttama Water Source,
(ac) 150 ML/year in the Numeralla East Water Source,

正如我所说，它适用于所有具有单字母索引的行，但不适用于双字母。

标签： pythonre

您可以使用https://regex101.com/或https://regexr.com/对正则表达式进行故障排除。这是与关键组件匹配的一个。

^\([^)]+\)\s+(\S+)\s+(.*\/year)\s+in the\s+(.*),

python - 如何在 Python 中修复我的重新编译语句

问题描述

解决方案

推荐阅读