首页 > 解决方案 > 如何有选择地捕获正则表达式值?

问题描述

我知道以前有人问过这个问题。但我似乎无法找到解决方案:

这是测试字符串

value: value1, Do not include this
value: value2

这是我的正则表达式:value: (.*)(?:, Do not include this)?

结果应该捕获

value1
value2

但相反,它捕捉到了这一点

value1, Do not include this
value2

[编辑] 基于评论和答案。让我澄清一下。

如果这是测试字符串

value: value1, Do not include this
value: value1, test,
value: man, this is bad!!, Do not include this

那么捕获的值应该是这样的:

value1
value1, test, test,
man, this is bad!!

标签: regex

解决方案


value: (.*)(?:, Do not include this)?
       ---- ~~~~~~~~~~~~~~~~~~~~~~~~
        A              B

The problem with your expression, is, that part A is allowed to match the whole line and part B is optional. The regex engine, upon encountering A, will simply jump to the end of the line it is currently matching against and consume all characters on the way. Then, having matched A, it will advance to part B of the expression, see that it can't be matched (because the whole line was already consumed) and that it is optional, and, this being the end of the expression, stop this attempt and declare the match successful.

One way to prevent this from happening, would be to make part A lazy while forcing the expression to match the whole line by using an end-of-line anchor. For example:

value: (.*?)(?:, Do not include this)?$

See demo.

You could also make part A and B so distinct from each other, that you don't have to worry about one matching in place of the other. If applicable, this would allow you to keep the greedy quantifier for part A. For example:

value: ([^,]*)(?:, Do not include this)?

Which way is more suitable to your needs depends on the composition of the strings you match against.


推荐阅读