首页 > 解决方案 > 如何匹配字符串中的所有字符并收集多个组?

问题描述

我试图弄清楚如何使用 preg_match_all 在特定匹配中选择所有“-on_”。

我尝试了很多正则表达式模式,但我完全被难住了。我们公司最好的正则表达式已经为此工作了一两个小时,也没有取得任何进展。

这似乎是最有希望.*(-on_).*的 - 但只捕获每场比赛的最后一个“-on_”。第一个匹配也可以正常工作,但第二个匹配是页面上的所有内容。我不明白为什么。

我正在尝试解析的 HTML 示例...

<span class="RatingStar__bew-avgstars__2enAh">
            <div class="RatingStar__be-c-star__24d1B ">
                <span><span class="RatingStar__be-star-off__2ks1e">★&lt;/span></span>
                <span><span class="RatingStar__be-star-on__28Wmg">★&lt;/span></span>
            </div>
            <div class="RatingStar__be-c-star__24d1B ">
                <span><span class="RatingStar__be-star-off__2ks1e">★&lt;/span></span>
                <span><span class="RatingStar__be-star-on__2ks1e">★&lt;/span></span>
            </div>
            <div class="RatingStar__be-c-star__24d1B ">
                <span><span class="RatingStar__be-star-off__2ks1e">★&lt;/span></span>
                <span><span class="RatingStar__be-star-on__2ks1e">★&lt;/span></span>
            </div>
            <div class="RatingStar__be-c-star__24d1B ">
                <span><span class="RatingStar__be-star-off__2ks1e">★&lt;/span></span>
                <span><span class="RatingStar__be-star-on__2ks1e">★&lt;/span></span>
            </div>
            <div class="RatingStar__be-c-star__24d1B ">
                <span><span class="RatingStar__be-star-off__2ks1e">★&lt;/span></span>
                <span><span class="RatingStar__be-star-off__2ks1e">★&lt;/span></span>
            </div>
        </span>

... more unimportant no-need-to-match code between ...


<span class="RatingStar__bew-avgstars__2enAh">
            <div class="RatingStar__be-c-star__24d1B ">
                <span><span class="RatingStar__be-star-off__2ks1e">★&lt;/span></span>
                <span><span class="RatingStar__be-star-on__28Wmg">★&lt;/span></span>
            </div>
            <div class="RatingStar__be-c-star__24d1B ">
                <span><span class="RatingStar__be-star-off__2ks1e">★&lt;/span></span>
                <span><span class="RatingStar__be-star-on__2ks1e">★&lt;/span></span>
            </div>
            <div class="RatingStar__be-c-star__24d1B ">
                <span><span class="RatingStar__be-star-off__2ks1e">★&lt;/span></span>
                <span><span class="RatingStar__be-star-on__2ks1e">★&lt;/span></span>
            </div>
            <div class="RatingStar__be-c-star__24d1B ">
                <span><span class="RatingStar__be-star-off__2ks1e">★&lt;/span></span>
                <span><span class="RatingStar__be-star-on__2ks1e">★&lt;/span></span>
            </div>
            <div class="RatingStar__be-c-star__24d1B ">
                <span><span class="RatingStar__be-star-off__2ks1e">★&lt;/span></span>
                <span><span class="RatingStar__be-star-off__2ks1e">★&lt;/span></span>
            </div>
        </span>

我用来解析它的...

preg_match_all('~<span class="RatingStar__bew-avgstars__2enAh">.*(-on_).*</div></span>~', $html, $matches)

我得到的响应与它的大小无关,所以我将总结一下:

array:2 [▼
  0 => array:2 [▼
    0 => "Perfectly correct match"
    1 => "Match of the rest of the page (not correct)"
  ]
  1 => array:2 [▼
    0 => "-on_" // Last on in the match
    1 => "-on_" // Last on in the second match
  ]
]

对于我应该得到的 2 场比赛,我应该得到一组 4 组“-on_”,每场比赛都有列出的代码。

所以我真正期待的是:

array:2 [▼
  0 => array:2 [▼
    0 => "<span class="RatingStar__bew-avgstars__2enAh"><div class="RatingStar__be-c-star__24d1B "><span><span class="RatingStar__be-star-off__2ks1e">★&lt;/span></span><span ▶"
    1 => "<span class="RatingStar__bew-avgstars__2enAh"><div class="RatingStar__be-c-star__24d1B "><span><span class="RatingStar__be-star-off__2ks1e">★&lt;/span></span><span ▶"
  ]
  1 => array:2 [▼
    0 => ["-on_","-on_","-on_","-on_"] 
    1 => ["-on_","-on_","-on_","-on_"]
  ]
]

Maybe I'm completely missing something here... any advice?

标签: phpregex

解决方案


我相信这更接近你想要的:

~<span class="RatingStar__bew-avgstars__2enAh">[\s\S]*?(-on_)[\s\S]*?</div>\s*</span>~

你有三个问题:

  1. .*不匹配换行符\n更多信息。您可以[\s\S]*改用它,它匹配每个空白字符和每个非空白字符(因此,每个字符)。
  2. 该文本</div></span>不会出现在您的代码段中。</div>和之间有空格</span>。因此,</div>\s*?</span>
  3. 您使用的是贪婪运算符*而不是惰性运算符*?。这是一个问题,因为您的整个字符串以 结尾</div></span>,这意味着第一个匹配项将消耗所有其他匹配项并继续到字符串的末尾。

推荐阅读