regex - 为什么我的正则表达式会随着更多的运营商而呈指数级增长?
问题描述
所以我正在用 perl 正则表达式构建一个解析器。我在模式中间使用代码来实现优先级。我还有一个构建系统,可以自动生成称为构面的“影子”正则表达式,并且不包含任何代码,因此当我乱序匹配时,我可以填充到完全匹配。
无论如何,现在我已经创建了一个手写示例,并显示了我的二进制操作匹配代码。无论表达式中是否存在运算符,它都会以指数方式变慢,我们走的链越高。
例如,如果我从 shift 运算符开始,它将需要大约 1 秒,但如果我从 and 逻辑开始,则需要一整分钟或其他时间(我没有测量时间,但它明显慢得多),因为我'我之前提到过,实际的运算符不包含任何更高优先级的运算符——只有链中的最低优先级——比如加法和乘法。
这是 perl 示例(我来自 pcre2,当像这样启动时它完全失败,匹配计数耗尽):
#!/usr/bin/perl
$text = "7*7*4*5*6*7+(4*4*4+3*7*4);7*7*4*5*6*7+(4*4*4+3*7*4)";
$result = $text =~/((*F) (?# << clever way of declaring subroutines in a regex without actually executing them)
(?<orlogicplusrest>(?>(?&andlogicplusrest))(?>(?<orlogicop>(?<orlogicopraw>\|\|)(?>(?&orlogicplusrest))))
|(?>(?&andlogicplusrestfacet))(?{print("$+{orlogicopraw}\n")if($+{orlogicopraw});}))
(?<andlogicplusrest>(?>(?&orplusrest))(?>(?<andlogicop>(?<andlogicopraw>&&)(?>(?&andlogicplusrest))))
|(?>(?&orplusrestfacet))(?{print("$+{andlogicopraw}\n")if($+{andlogicopraw});}))
(?<orplusrest>(?>(?&xorplusrest))(?>(?<orop>(?!\|\|)(?<oropraw>\|)(?>(?&orplusrest))))
|(?>(?&xorplusrestfacet))(?{print("$+{oropraw}\n")if($+{oropraw});}))
(?<xorplusrest>(?>(?&andplusrest))(?>(?<xorop>(?<xoropraw>\^)(?>(?&xorplusrest))))
|(?>(?&andplusrestfacet))(?{print("$+{xoropraw}\n")if($+{xoropraw});}))
(?<andplusrest>(?>(?&eqplusrest))(?>(?<andop>(?!&&)(?<andopraw>&)(?>(?&andplusrest))))
|(?>(?&eqplusrestfacet))(?{print("$+{andopraw}\n")if($+{andopraw});}))
(?<eqplusrest>(?>(?&relplusrest))(?>(?<eqop>(?<eqopraw>==|!=)(?>(?&eqplusrest))))
|(?>(?&relplusrestfacet))(?{print("$+{eqopraw}\n")if($+{eqopraw});}))
(?<relplusrest>(?>(?&shiftplusrest))(?>(?<relop>(?<relopraw>(?!<<)<|(?!>>)>|<=|>=)(?>(?&relplusrest))))
|(?>(?&shiftplusrestfacet))(?{print("$+{relopraw}\n")if($+{relopraw});}))
(?<shiftplusrest>(?>(?&addmulplusrest))(?>(?<shiftop>(?<shiftopraw><<|>>)(?>(?&shiftplusrest))))
|(?>(?&addmulplusrestfacet))(?{print("$+{shiftopraw}\n")if($+{shiftopraw});}))
(?<addmulplusrest>(?>(?&muloprest))(?>(?<addop>(?!\+\+|--)(?<addopraw>[\+\-])(?>(?&addmulplusrest))))
|(?>(?&muloprestfacet))(?{print("$+{addopraw}\n") if($+{addopraw});}))
(?<muloprest>(?>(?&testxpr))(?>(?<mulop>(?<mulopraw>[\*\/\%])(?>(?&muloprest))))
|(?>(?&testxprfacet))(?{print("$+{mulopraw}\n") if($+{mulopraw});}))
(?# entry point - replace here)
(?<testentry>(?<innertest>(?>(?&orlogicplusrest)));(?&innertest);)
(?# either in parnetheses or without)
(?<testxpr>(?&inner)|(?&inparenths))
(?# testexpr facet - replace here)
(?<testxprfacet>(?&innerraw)|\s*[(](?&orlogicplusrestfacet)[)]\s*)
(?<inparenths>\s*[(](?&innertest)[)]\s*)
(?# a digit)
(?<inner>(?<innerraw>\d++)(?{print("$+{innerraw}\n") if($+{innerraw});}))
(?#following are facets - you can safely ignore them - they are the same as the above except they don't print anything)
(?<orlogicplusrestfacet>(?>(?&andlogicplusrestfacet))(?>((\|\|)(?>(?&orlogicplusrestfacet))))
|(?>(?&andlogicplusrestfacet)))
(?<andlogicplusrestfacet>(?>(?&orplusrestfacet))(?>((&&)(?>(?&andlogicplusrestfacet))))
|(?>(?&orplusrestfacet)))
(?<orplusrestfacet>(?>(?&xorplusrestfacet))(?>((?!\|\|)(\|)(?>(?&orplusrestfacet))))
|(?>(?&xorplusrestfacet)))
(?<xorplusrestfacet>(?>(?&andplusrestfacet))(?>((\^)(?>(?&xorplusrestfacet))))
|(?>(?&andplusrestfacet)))
(?<andplusrestfacet>(?>(?&eqplusrestfacet))(?>((?!&&)(&)(?>(?&andplusrestfacet))))
|(?>(?&eqplusrestfacet)))
(?<eqplusrestfacet>(?>(?&relplusrestfacet))(?>((==|!=)(?>(?&eqplusrestfacet))))
|(?>(?&relplusrestfacet)))
(?<relplusrestfacet>(?>(?&shiftplusrestfacet))(?>(((?!<<)<|(?!>>)>|<=|>=)(?>(?&relplusrestfacet))))
|(?>(?&shiftplusrestfacet)))
(?<shiftplusrestfacet>(?>(?&addmulplusrestfacet))(?>((<<|>>)(?>(?&shiftplusrestfacet))))
|(?>(?&addmulplusrestfacet)))
(?<addmulplusrestfacet>(?>(?&muloprestfacet))(?>((?!\+\+|--)([\+\-])(?>(?&addmulplusrestfacet))))
|(?>(?&muloprestfacet)))
(?<muloprestfacet>(?>(?&testxprfacet))(?>(([\*\/\%])(?>(?&muloprestfacet))))
|(?>(?&testxprfacet)))
(?#end facets - follows the entry point)
)|(^(?&testentry))/xx;
print $&;
在此处显示替换的地方,您可以orlogicplusrest
使用链中的任何其他运算符更改 以检查性能 - 例如,如果您放置shiftplusrest
(and shiftplusrestfacet
),它将几乎立即执行。
所以我的问题是 - 有什么方法可以加快速度,同时以正确的顺序考虑每个运营商。我已经尝试在任何地方添加原子组,但它仍然是一样的。
解决方案
所以实际上,老实说,我在以前的版本中有一个实际工作的解决方案——我只需要稍微调整一下。
我不知道它是如何做到的,但它更快并且它正在工作。唯一需要注意的是,您需要指定表达式终止的位置,否则它将缩小到第一个操作。
嗯,您基本上可以在一种情况下对它们进行比较-例如,我们拥有的原始版本:
~/(?<orlogicplusrest>(?>(?&andlogicplusrest))(?>(?<orlogicop>(?<orlogicopraw>\|\|)(?>(?&orlogicplusrest))))
|(?>(?&andlogicplusrestfacet))(?{print("$+{orlogicopraw}\n")if($+{orlogicopraw});}))/
正如您在此处看到的,我们匹配整个事物 - 右侧和左侧,但另一方面在我更快的解决方案中:
~/(?<orlogicplusrest>(?<orlogicop>(?<orlogicopraw>\|\|)(?&primexprnoternary)(?!(?&andlogicplusrestfacet))
(?{print("$+{orlogicopraw}\n")if($+{orlogicopraw});}))|(?&andlogicplusrest)(?&orlogicop)?)/
仅作为后缀匹配某物,并用于primexprnoternary
制作循环:
~/(?<primexprnoternary>(?&testxpr)((?&orlogicplusrest)?+)*?)/
Asprimexprnoternary
也用于运算符内部的右侧。
在另一个解决方案中,我们只写:
~/(?<innertest>(?>(?&orlogicplusrest)))/
这是完整的代码:
#!/usr/bin/perl
$text = "7*7*4*5*6*7+(4*4*4+3*7*4);";
$result = $text =~/((*F) (?# << clever way of declaring subroutines in a regex without actually executing them)
(?<orlogicplusrest>(?<orlogicop>(?<orlogicopraw>\|\|)(?&primexprnoternary)(?!(?&andlogicplusrestfacet))
(?{print("$+{orlogicopraw}\n")if($+{orlogicopraw});}))|(?&andlogicplusrest)(?&orlogicop)?)
(?<andlogicplusrest>(?<andlogicop>(?<andlogicopraw>&&)(?&primexprnoternary)(?!(?&orplusrestfacet))
(?{print("$+{andlogicopraw}\n")if($+{andlogicopraw});}))|(?&orplusrest)(?&andlogicop)?)
(?<orplusrest>(?<orop>(?!\|\|)(?<oropraw>\|)(?&primexprnoternary)(?!(?&xorplusrestfacet))
(?{print("$+{oropraw}\n")if($+{oropraw});}))|(?&xorplusrest)(?&orop)?)
(?<xorplusrest>(?<xorop>(?<xoropraw>\^)(?&primexprnoternary)(?!(?&andplusrestfacet))
(?{print("$+{xoropraw}\n")if($+{xoropraw});}))|(?&andplusrest)(?&xorop)?)
(?<andplusrest>(?<andop>(?!&&)(?<andopraw>&)(?&primexprnoternary)(?!(?&eqplusrestfacet))
(?{print("$+{andopraw}\n")if($+{andopraw});}))|(?&eqplusrest)(?&andop)?)
(?<eqplusrest>(?<eqop>(?<eqopraw>==|!=)(?&primexprnoternary)(?!(?&relplusrestfacet))
(?{print("$+{eqopraw}\n")if($+{eqopraw});}))|(?&relplusrest)(?&eqop)?)
(?<relplusrest>(?<relop>(?<relopraw>(?!<<)<|(?!>>)>|<=|>=)(?&primexprnoternary)(?!(?&shiftplusrestfacet))
(?{print("$+{relopraw}\n")if($+{relopraw});}))|(?&shiftplusrest)(?&relop)?)
(?<shiftplusrest>(?<shiftop>(?<shiftopraw><<|>>)(?&primexprnoternary)(?!(?&addmulplusrestfacet))
(?{print("$+{shiftopraw}\n")if($+{shiftopraw});}))|(?&addmulplusrest)(?&shiftop)?)
(?<addmulplusrest>(?<addop>(?<addopraw>[\+\-])((?&primexprnoternary)(?!(?&mulopfacet))
(?{print("$+{addopraw}\n")if($+{addopraw});})))|(?&mulop)(?&addop)?)
(?<mulop>(?<mulopraw>[\*\/\%])(?&primexprnoternary)
(?{print("$+{mulopraw}\n")if($+{mulopraw});}))
(?# entry point)
(?<testentry>(?<innertest>(?&primexprnoternary));)
(?# replace here)
(?<primexprnoternary>(?&testxpr)((?&orlogicplusrest)?+)*?)
(?# either in parnetheses or without)
(?<testxpr>(?&inner)|(?&inparenths))
(?# primexprnoternary facet - replace here)
(?<primexprnoternaryfacet>(?&testxprfacet)((?&orlogicplusrestfacet)?+)*?)
(?<testxprfacet>(?&innerraw)|\s*[(](?&primexprnoternaryfacet)[)]\s*)
(?<inparenths>\s*[(](?&primexprnoternary)[)]\s*)
(?# a digit)
(?<inner>(?<innerraw>\d++)(?{print("$+{innerraw}\n") if($+{innerraw});}))
(?#following are facets - you can safely ignore them - they are the same as the above except they don't print anything)
(?<orlogicplusrestfacet>(?&andlogicplusrest)((\|\|)(?&primexprnoternaryfacet)(?!(?&andlogicplusrestfacet)))?)
(?<andlogicplusrestfacet>(?&orplusrest)((&&)(?&primexprnoternaryfacet)(?!(?&orplusrestfacet)))?)
(?<orplusrestfacet>(?&xorplusrest)((?!\|\|)(\|)(?&primexprnoternaryfacet)(?!(?&xorplusrestfacet)))?)
(?<xorplusrestfacet>(?&andplusrest)((\^)(?&primexprnoternaryfacet)(?!(?&andplusrestfacet)))?)
(?<andplusrestfacet>(?&eqplusrest)((?!&&)(&)(?&primexprnoternaryfacet)(?!(?&eqplusrestfacet)))?)
(?<eqplusrestfacet>(?&relplusrest)((==|!=)(?&primexprnoternaryfacet)(?!(?&relplusrestfacet)))?)
(?<relplusrestfacet>(?&shiftplusrest)(((?!<<)<|(?!>>)>|<=|>=)(?&primexprnoternaryfacet)(?!(?&shiftplusrestfacet)))?)
(?<shiftplusrestfacet>(?&addmulplusrest)((<<|>>)(?&primexprnoternaryfacet)(?!(?&addmulplusrestfacet)))?)
(?<addmulplusrestfacet>(?&mulop)(([\+\-])((?&primexprnoternaryfacet)(?!(?&mulopfacet))))?)
(?<mulopfacet>([\*\/\%])(?&primexprnoternaryfacet))
(?#end facets - follows the entry point)
)|(^(?&testentry))/xx;
print $&;
推荐阅读
- java - Spring 配置属性递归类型
- javascript - Javascript 调用处理程序时不调用 WKUserContentController
- django - 致命:“credential.usehttppath”的错误数字配置值“true{/code}”:无效单位
- jquery - 带有滚动和动画的jquery更改部分
- java - 使用 Maven 的 JavaFX 11 抛出异常:“WindowsNativeRunloopThread”
- python - 从字典中的列表中查找元素并返回该键
- python - 成功安装包(Bcrypt)但仍然报错
- c# - 无法使用自定义属性更新 AspNetUsers Identity
- postgresql - 连接本地主机 Postgres 数据库
- python - 加载网页时如何防止闪烁?