首页 > 解决方案 > 正则表达式替换:替换文本,而不是代码

问题描述

我试图解决几天的正则表达式测验,但仍然无法正确解决。我已经很近了,但仍然无法通过。

任务:

在 HTML 页面中,将文本替换micro&micro;. 哦,不要搞砸代码:不要更换内部<the tags>&entities;

代替

不要触摸


我试过这个,但最后失败了&micro;,我错过了什么?有人能指出我错过了什么吗?提前致谢!

我试过的:

正则表达式

((?:\G|\n)(?:.*?&.*?micro.*?;[\s\S]*?|.*?<.*?micro.*?>[\s\S]*?|.)*?)micro

替代

$1&micro;

标签: regexpcresubstitution

解决方案


使用SKIP-FAIL 技术,但作为一个整体匹配:

(?:<[^<>]*>|&\w+;)(*SKIP)(*F)|\bmicro\b

证明

解释

--------------------------------------------------------------------------------
  (?:                      group, but do not capture:
--------------------------------------------------------------------------------
    <                        '<'
--------------------------------------------------------------------------------
    [^<>]*                   any character except: '<', '>' (0 or
                             more times (matching the most amount
                             possible))
--------------------------------------------------------------------------------
    >                        '>'
--------------------------------------------------------------------------------
   |                        OR
--------------------------------------------------------------------------------
    &                        '&'
--------------------------------------------------------------------------------
    \w+                      word characters (a-z, A-Z, 0-9, _) (1 or
                             more times (matching the most amount
                             possible))
--------------------------------------------------------------------------------
    ;                        ';'
--------------------------------------------------------------------------------
  )                        end of grouping
--------------------------------------------------------------------------------
  (*SKIP)(*F)              Skip the match and go on matching from current location
--------------------------------------------------------------------------------
 |                        OR
--------------------------------------------------------------------------------
  \b                       the boundary between a word char (\w) and
                           something that is not a word char
--------------------------------------------------------------------------------
  micro                    'micro'
--------------------------------------------------------------------------------
  \b                       the boundary between a word char (\w) and
                           something that is not a word char

推荐阅读