首页 > 解决方案 > 如何查找所有与货币相关的数字 REGEX?

问题描述

对于具有自由文本的字符串:

"The shares of the stock at the XKI Market fell by €89.99 today, which saw a drop of a 9€5 
from last monday. If they do not level up again to 100€ by the end of this week there might 
be serious consequences to the company"

如何找到将提取货币相关数字的正则表达式模式?

在这种情况下:89.99、95 和 100?

到目前为止,我已经尝试了这些模式:

[0-9]*[€.]([0-9]*)
\[0-9]{1,3}(?:\.\[0-9]{3})*,\[0-9]\[0-9]
[0-9]+\€\.[0-9]+

但这些似乎并没有准确地生产出所需要的东西

标签: regex

解决方案


一种选择是匹配所有 3 个变体,然后从匹配中删除欧元符号。

(?:\d+€\d*|€\d+(?:\.\d+)?)

解释

  • (?:非捕获组
    • \d+€\d*匹配 1+ 位和 € 后跟可选数字
    • |或者
    • €\d+(?:\.\d+)?匹配 € 后跟数字和可选的小数部分
  • )关闭非捕获组

正则表达式演示

例如

import re

regex = r"(?:\d+€\d*|€\d+(?:\.\d+)?)"

test_str = ("\"The shares of the stock at the XKI Market fell by €89.99 today, which saw a drop of a 9€5 \n"
            "from last monday. If they do not level up again to 100€ by the end of this week there might \n"
            "be serious consequences to the company\"")

print([x.replace("€", "") for x in re.findall(regex, test_str)])

输出

['89.99', '95', '100']

对于带有可选逗号后跟 3 位数字和 2 位小数部分的数字,更精确的模式可能是:

(?:\d+€\d*|€\d{1,3}(?:,\d{3})*\.\d{2})

正则表达式演示


推荐阅读