首页 > 解决方案 > 修改 DOM 一次会导致后续修改报错

问题描述

我正在尝试<span>使用 PHP 的 DOMDocument 和 XPath 将某些短语的所有实例包装起来。我的逻辑基于另一篇文章的这个答案,但这仅允许我在需要选择所有匹配项时选择节点中的第一个匹配项。

一旦我为第一个匹配修改了 DOM,我的后续循环会导致错误,并Fatal error: Uncaught Error: Call to a member function splitText() on bool在行中声明带有$after. 我很确定这是由修改标记引起的,但我一直无法弄清楚原因。

我在这里做错了什么?

/**
 * Automatically wrap various forms of CCJM in a class for branding purposes
 *
 * @link https://stackoverflow.com/a/6009594/654480
 *
 * @param string $content
 * @return string
 */
function ccjm_branding_filter(string $content): string {
    if (! (is_admin() && ! wp_doing_ajax()) && $content) {
        $DOM = new DOMDocument();

        /**
         * Use internal errors to get around HTML5 warnings
         */
        libxml_use_internal_errors(true);

        /**
         * Load in the content, with proper encoding and an `<html>` wrapper required for parsing
         */
        $DOM->loadHTML("<?xml encoding='utf-8' ?><html>{$content}</html>", LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);

        /**
         * Clear errors to get around HTML5 warnings
         */
        libxml_clear_errors();

        /**
         * Initialize XPath
         */
        $XPath = new DOMXPath($DOM);

        /**
         * Retrieve all text nodes, except those within scripts
         */
        $text = $XPath->query("//text()[not(parent::script)]");

        foreach ($text as $node) {
            /**
             * Find all matches, including offset
             */
            preg_match_all("/(C\.? ?C\.?(?:JM| Johnson (?:&|&amp;|&#38;|and) Malhotra)(?: Engineers, LTD\.?|, P\.?C\.?)?)/i", $node->textContent, $matches, PREG_OFFSET_CAPTURE);

            /**
             * Wrap each match in appropriate span
             */
            foreach ($matches as $group) {
                foreach ($group as $key => $match) {
                    /**
                     * Determine the offset and the length of the match
                     */
                    $offset = $match[1];
                    $length = strlen($match[0]);

                    /**
                     * Isolate the match and what comes after it
                     */
                    $word  = $node->splitText($offset);
                    $after = $word->splitText($length);

                    /**
                     * Create the wrapping span
                     */
                    $span = $DOM->createElement("span");
                    $span->setAttribute("class", "__brand");

                    /**
                     * Replace the word with the span, and then re-insert the word within it
                     */
                    $word->parentNode->replaceChild($span, $word);
                    $span->appendChild($word);

                    break; // it always errors after the first loop
                }
            }
        }

        /**
         * Save changes, remove unneeded tags
         */
        $content = implode(array_map([$DOM->documentElement->ownerDocument, "saveHTML"], iterator_to_array($DOM->documentElement->childNodes)));
    }

    return $content;
}
add_filter("ccjm_final_output", "ccjm_branding_filter");

示例内容(“CC Johnson & Malhotra, PC”和“CCJM”的所有实例都匹配,但只有第一个可以成功修改):

C.C. Johnson & Malhotra, P.C. (CCJM) was an integral member of a large Design Team for a 16.5-mile-long Public-Private Partnership (P3) Purple Line Project. The east-west light rail system extends from New Carrollton in PG County, MD to Bethesda in MO County, MD with 21 stations and one short tunnel. CCJM was Engineer of Record (EOR) for the design of eight (8) Bridges and design reviews for 35 transit/highway bridges and over 100 retaining walls of different lengths/types adjacent to bridges and in areas of cut/fill. CCJM designed utility structures for 42,000 LF of relocated water mains and 19,000 LF of relocated sewer mains meeting Washington Suburban Sanitary Commission (WSSC), Md Dept of Transportation (MDOT) MTA, and Local Standards.

编辑1:做一些测试,当我输出时$node->textContent,我看到它在第一个循环之后发生了变化......所以我认为发生的事情是在我这样做之后$node->splitText($offset),它实际上是在更新整个节点,所以后续的偏移量不起作用。

标签: phpregexwordpressxpathdomdocument

解决方案


首先,我认为foreach ($matches as $group)这里不正确 - 如果您检查 $matches 包含的内容,那就是相同的匹配两次,但您可能不想将它们包装成 span 两次。因此,应该删除 foreach 循环,而应该$matches[0]只进行下一个循环。

其次,我认为您的偏移问题可以简单地解决,如果您只是“向后骑马” - 不要从头到尾替换找到的匹配项,而是以相反的顺序替换。那么你将永远只在当前位置“后面”操纵结构,因此无论那里发生什么变化,都不会影响先前匹配的位置。

        /**
         * Wrap each match in appropriate span
         */
        //foreach ($matches as $group) {
        $group = array_reverse($matches[0]);
            foreach ($group as $key => $match) {
                /**
                 * Determine the offset and the length of the match
                 */
                $offset = $match[1];
                $length = strlen($match[0]);

                /**
                 * Isolate the match and what comes after it
                 */
                $word  = $node->splitText($offset);
                $after = $word->splitText($length);

                /**
                 * Create the wrapping span
                 */
                $span = $DOM->createElement("span");
                $span->setAttribute("class", "__brand");

                /**
                 * Replace the word with the span, and then re-insert the word within it
                 */
                $word->parentNode->replaceChild($span, $word);
                $span->appendChild($word);

                //break; // it always errors after the first loop
            }
        //}

我从您的示例输入数据中得到的结果如下(此处为现场示例,https://3v4l.org/kbSQ8

<p><span class="__brand">C.C. Johnson &amp; Malhotra, P.C.</span> (<span
class="__brand">CCJM</span>) was an integral member of a large Design Team
for a 16.5-mile-long Public-Private Partnership (P3) Purple Line Project.
The east-west light rail system extends from New Carrollton in PG County,
MD to Bethesda in MO County, MD with 21 stations and one short tunnel.
<span class="__brand">CCJM</span> was Engineer of Record (EOR) for the
design of eight (8) Bridges and design reviews for 35 transit/highway
bridges and over 100 retaining walls of different lengths/types adjacent to
bridges and in areas of cut/fill. <span class="__brand">CCJM</span>
designed utility structures for 42,000 LF of relocated water mains and
19,000 LF of relocated sewer mains meeting Washington Suburban Sanitary
Commission (WSSC), Md Dept of Transportation (MDOT) MTA, and Local
Standards.</p>

推荐阅读