首页 > 解决方案 > A regex to get any price string

问题描述

I need to get the price from a string, but no other numbers. There are no restrictions on what the string can say, but it will always have a dollar amount in it. It's the dollar amount I need to get from the string.

The closest solution I've been able to find is \d{1,3}[,\\.]?(\\d{1,2})?

On an example string like, "2 BED / 2 BATH for $120,000.00, what a deal!!!", the regex should only return $1,000,000, and no other numbers. The solution above will return 2, 2, and 1,000,000.00. An ideal solution should NOT match on any digits that are outside of the dollar amount. It also needs to include the symbol immediately before the match (to account for the possibility of all currency symbols (USD, GBP, EUR, etc).

So, the price that's matched by the regex should look like: $120,000.00, but it could also match on something like €40,000

标签: regexprice

解决方案


If you want to match all currency symbols before a number with the number itself, you may combine the two expressions:

  • Currency symbol regex: \b(?:[BS]/\.|R(?:D?\$|p))| \b(?:[TN]T|[CJZ])\$|Дин\.|\b(?:Bs|Ft|Gs|K[Mč]|Lek|B[Zr]|k[nr]|[PQLSR]|лв|ден|RM|MT|lei|zł|USD|GBP|EUR|JPY|CHF|SEK|DKK|NOK|SGD|HKD|AUD|TWD|NZD|CNY|KRW|INR|CAD|VEF|EGP|THB|IDR|PKR|MYR|PHP|MXN|VND|CZK|HUF|PLN|TRY|ZAR|ILS|ARS|CLP|BRL|RUB|QAR|AED|COP|PEN|CNH|KWD|SAR)\b|\$[Ub]|[\p{Sc}ƒ]
  • Number regex: (?<!\d)(?<!\d\.)(?:\d{1,3}(?:,\d{3})*|\d+)(?:\.\d{1,2})?(?!\.?\d)

Currencies are taken from World Currency Symbols, the 3-letter currency codes used in the pattern are the most commonly used ones, but the comprehensive list can also be compiled using those data.

The answer is

(?:\b(?:[BS]/\.|R(?:D?\$|p))|\b(?:[TN]T|[CJZ])\$|Дин\.|\b(?:Bs|Ft|Gs|K[Mč]|Lek|B[Zr]|k[nr]|[PQLSR]|лв|ден|RM|MT|lei|zł|USD|GBP|EUR|JPY|CHF|SEK|DKK|NOK|SGD|HKD|AUD|TWD|NZD|CNY|KRW|INR|CAD|VEF|EGP|THB|IDR|PKR|MYR|PHP|MXN|VND|CZK|HUF|PLN|TRY|ZAR|ILS|ARS|CLP|BRL|RUB|QAR|AED|COP|PEN|CNH|KWD|SAR)|\$[Ub]|[\p{Sc}ƒ])\s?(?:\d{1,3}(?:,\d{3})*|\d+)(?:\.\d{1,2})?(?!\.?\d)

See the regex demo

It is created like this: (?:CUR_SYM_REGEX)\s?NUM_REGEX, with the lookbehinds in number regex stripped from the pattern since the left-hand context is already defined.


推荐阅读