首页 > 解决方案 > Extract an url link "only" with the ".php.com" by regular expression

问题描述

Trying to extract an url link "only" with the ".php.com" (i.e www.sample.php.com) by regular expression, it is not able to extract and identify specific url with (.php.com ) as the following code. Please help and advise me if you got an idea. Thanks in advances.

The following code is able to extract an http or https url , but not able to identify and extract exactly the url only with php.com (i.e www.sample.php.com). How am I supposed to modify the following regular expression to extract the url with ( php.com ) only?

http[s]?://(?:[a-zA-Z]|[0-9]|[$-_@.&+]|[!*(),]|(?:%[0-9a-fA-F][0-9a-fA-F]))+

<> Only extract url with php.com (www.sample.php.com) , not (www.sample.com or other)

标签: pythonregexpython-3.xpython-2.7

解决方案


You could add a positive lookahead which asserts that one of the URL components is PHP:

http[s]?://(?=.*\bphp\.com\b)(?:[a-zA-Z]|[0-9]|[$-_@.&+]|[!*(),]|(?:%[0-9a-fA-F][0-9a-fA-F]))+
            ^^^ change is here

Demo


推荐阅读