首页 > 解决方案 > 99% 的正则表达式用于匹配 URL(但不是锚标签内的)

问题描述

任何人都可以帮助这个正则表达式吗?它已经完成了 99% 的路程。只是一对非常奇怪的失败测试。

对于:“ http://www.server.com ”和“ http://www.server.com ” 正则表达式匹配“ http://www.server.co ”我推断它应该匹配失败,因为结束标记的前瞻。

(?<!(?:>\s*)|(?:href="")) (?# Anti-Anchor Tag)
(?:(?:https?|ftp)://) (?# Protocol)
(?# Username/Password intentionally left out)
(?:
    (?!10(?:\.\d{1,3}){3}) (?# Exclude 10.X.X.X)
    (?!127(?:\.\d{1,3}){3}) (?# Exclude 127.X.X.X)
    (?!169\.254(?:\.\d{1,3}){2}) (?# Exclude 169.254.X.X)
    (?!192\.168(?:\.\d{1,3}){2}) (?# Exclude 192.168.X.X)
    (?!172\.(?:1[6-9]|2\d|3[0-1])(?:\.\d{1,3}){2}) (?# Exclude 172.X.X.X)
    (?:[1-9]\d?|1\d\d|2[01]\d|22[0-3])(?:\.(?:1?\d{1,2}|2[0-4]\d|25[0-5])){2}(?:\.(?:[1-9]\d?|1\d\d|2[0-4]\d|25[0-4]) (?# IP address)
)|(?:
    (?:[\w\d\u00a1-\uffff]+-?)*[\w\d\u00a1-\uffff]+) (?# First part of host)
    (?:\.(?:[\w\d\u00a1-\uffff]+-?)*[\w\d\u00a1-\uffff]+)* (?# Middle parts of host)
    \.[\w\d\u00a1-\uffff]+ (?# Last part of host)
)
(?::\d{2,5})? (?# Port number)
(?:/[^\s\?#]*)? (?# Folders & Files)
(?:\?[^\s#]*)? (?# Query String)
(?:\#\S*)? (?# Fragment)
(?!\s*</a>) (?# Anti-Closing Anchor Tag)

顺便说一句,这是在https://mathiasbynens.be/demo/url-regex找到的 @diegoperini 的修改版本

标签: regexurltagsanchor

解决方案


推荐阅读