首页 > 解决方案 > 正则表达式在每个唯一字符串之前查找所有最后变化的字符串

问题描述

我想为这个应用程序找到正则表达式。我已经搜索但找不到答案,但我不是正则表达式专家。我将尝试解释我想要做什么。我想要一个正则表达式来查找每个唯一字符串之前的所有最后一个 url

我试过 (?!href).*(?<=Uniquestring containsspecialcharacters) 但是在使用实际的 html 时它会挂起程序,可能是因为它比我这里的示例长得多。

在此示例中,我想查找包含可能有很多特殊字符的唯一字符串之前的所有最后部分 url。

像下面的虚拟东西一样,但没有新行(添加新行以使您更容易理解我的意思)还有 randomjunkincludingspacesandspecialcharacterswithoutausefulpattern _-.,<>:;"azAZ09 实际上是 href 之间的随机内容。有一个不同数量的网址和我感兴趣的网址之间的随机垃圾:

href="/randomtextandornumberthatIdontwant/morerandomtextandornumberthatIdontwant/" randomjunkincludingspacesandspecialcharacterswithoutausefulpatern _-.,<>:;"azAZ09 
href="/randomtextandornumberthatIdontwant/morerandomtextandornumberthatIdontwant/" randomjunkincludingspacesandspecialcharacterswithoutausefulpatern _-.,<>:;"azAZ09 
href="/randomtextandornumberthatIdontwant/morerandomtextandornumberthatIdontwant/" randomjunkincludingspacesandspecialcharacterswithoutausefulpatern _-.,<>:;"azAZ09 
href="/randomtextandornumberthatIdontwant/morerandomtextandornumberthatIdontwant/" randomjunkincludingspacesandspecialcharacterswithoutausefulpatern _-.,<>:;"azAZ09 
href="/randomtextandornumberthatIdontwant/morerandomtextandornumberthatIdontwant/" randomjunkincludingspacesandspecialcharacterswithoutausefulpatern _-.,<>:;"azAZ09 
href="/theinfoIwant/moreinfoIwant/" randomjunkincludingspacesandspecialcharacterswithoutausefulpatern _-.,<>:;"azAZ09
Uniquestringcontainingspecialcharacters
randomjunkincludingspacesandspecialcharacterswithoutausefulpatern _-.,<>:;"azAZ09 
href="/randomtextandornumberthatIdontwant/morerandomtextandornumberthatIdontwant/" randomjunkincludingspacesandspecialcharacterswithoutausefulpatern _-.,<>:;"azAZ09 
href="/randomtextandornumberthatIdontwant/morerandomtextandornumberthatIdontwant/" randomjunkincludingspacesandspecialcharacterswithoutausefulpatern _-.,<>:;"azAZ09 
href="/randomtextandornumberthatIdontwant/morerandomtextandornumberthatIdontwant/" randomjunkincludingspacesandspecialcharacterswithoutausefulpatern _-.,<>:;"azAZ09 
href="/differentinfoIwant/moredifferentinfoIwant/" randomjunkincludingspacesandspecialcharacterswithoutausefulpatern _-.,<>:;"azAZ09
Uniquestringcontainingspecialcharacters
randomjunkincludingspacesandspecialcharacterswithoutausefulpatern _-.,<>:;"azAZ09 

所以在这里我想得到:

/theinfoIwant/moreinfoIwant/
/differentinfoIwant/moredifferentinfoIwant/

标签: regex

解决方案


基本上你正在寻找的正则表达式可能类似于

 href="[^"]*"(?=(?:(?!href=).)*Uniquestringcontainingspecialcharacters)

Where.也匹配换行符(取决于语言/s标志)

  • href="[^"]*"火柴
    • href="其次是
    • "除了尽可能多的字符之外的任何字符
    • "
  • (?=...)是关闭后位置的前瞻断言"
    • (?:(?!href=).)*是经过调和的贪婪令牌(使用负前瞻来尽可能多地匹配任何字符,以确保它不包含href=
    • Uniquestringcontainingspecialcharacters特殊令牌

稍微好一点的Uniquestringcontainingspecialcharacters也可以加入回火贪心模式:

href="[^"]*"(?=(?:(?!href=|Uniquestringcontainingspecialcharacters).)*Uniquestringcontainingspecialcharacters)

推荐阅读