首页 > 解决方案 > 正则表达式 (PHP) 匹配从字符串开头开始或前面有空格+数字+/+空格的所有内容

问题描述

我有一个看起来像这样的字符串:

1/ This is a string and it                                                                                   
has some text on a new line
2/ And then there's another string that has text only on one line
532/ Another string that has some year on a new line 
2020/xyz followed by some letters
720/ This is a match on the same line with another match but the other match won't be captured 721/ And this is the last line 

我想捕获每个以\d小于或等于 3 dgits long( {1,3}) 的数字() 开头并具有正斜杠 ( /) 并且位于字符串开头或前后有空格或换行符的字符串( \s+)。

这就是我希望它的样子:

[Match 1] 1/ This is a string and it has some text on a new line
[Match 2] 2/ And then there's another string that has text only on one line
[Match 3] 532/ Another string that has some year on a new line 2020/xyz followed by some letters
[Match 4] 720/ This is a match on the same line with another match but the other match won't be captured
[Match 5] 721/ And this is the last line 

到目前为止,这是我的代码:

$re = '/(\s|^)(?s)\d{1,3}+\/+\s+.*?(?=\d+\/+\s+|$)/m';
$str = '1/ This is a string and it 
has some text on a new line
2/ And then there\'s another string that has text only on one line
532/ Another string that has some year on a new line
2020/xyz followed by some letters
721/ And this is the last line ';

preg_match_all($re, $str, $matches, PREG_SET_ORDER, 0);

// Print the entire match result
var_dump($matches);

这是一个演示

但这里有问题:

  1. 如果两个匹配项在同一行上,它将不会捕获字符串(匹配 4 和 5 只会捕获匹配 4)
  2. 它不会捕获新行上的字符串
  3. 它不会捕获字符串中包含数字后跟 / 的部分,例如2020/xyz followed by some letters

标签: phpregex

解决方案


将匹配行尾的锚$(使用 m 修饰符)更改为\z锚(无论修饰符都匹配字符串的结尾)。

这样,不情愿的量词.*?将能够在多行上匹配,而不是在行的第一端停止。

要在同一行中查找多个匹配项\s+,请在数字前添加前瞻。否则数字之前的空间不能被消耗两次(一次 by.*?和一次 by (\s|^))。

~(\s|^)\d{1,3}/+\s.*?(?=\s+\d{1,3}/+\s|\z)~ms

请注意,您可以使用以下方法获得修剪后的结果:

~(?<!\S)\d{1,3}/+\s.*?(?=\s+\d{1,3}/+\s|\s*\z)~s

要减少步骤数,您可以更改\s.*?(?>\s+\S+)*?并删除不再需要的 s 修饰符。


推荐阅读