首页 > 解决方案 > Regex with negation for extracting web link

问题描述

I have a text fragment:

.....https://www.one.com/privacy/\............http://two.com/terms/'.............https://three.com/pricing/\..........https://four.com/widget/wg74ythx;.........http://five.com/pricing .........

My code for extracting web links: link = re.compile(r'https?://(\w.*?)(\\|;|\'|\s)')

But I need to exclude from my results all links with the words "privacy" or "widget". I`m stuck here, and I need the help of the community.

标签: pythonpython-3.xregex-negation

解决方案


If you don't need a compile object you could do something like

s = mystring urls = [url[0] for url in re.findall(r'https?://(\w.*?)(\\|;|\'|\s)',s) \ if not re.search('privacy|widget',url[0])]


推荐阅读