首页 > 解决方案 > re.findall -> Python 中的正则表达式

问题描述

import regex
frase = "text https://www.gamivo.com/product/sea-of-thieves-pc-xbox-one other text https://www.gamivo.com/product/fifa-21-origin-eng-pl-cz-tr"
x = regex.findall(r"/((http[s]?:\/\/)?(www\.)?(gamivo\.com\S*){1})", frase) 
print(x)

结果:

[('www.gamivo.com/product/sea-of-thieves-pc-xbox-one', '', 'www.', 'gamivo.com/product/sea-of-thieves-pc-xbox-one'), ('www.gamivo.com/product/fifa-21-origin-eng-pl-cz-tr', '', 'www.', 'gamivo.com/product/fifa-21-origin-eng-pl-cz-tr')]

我想要类似的东西:

[('https://www.gamivo.com/product/sea-of-thieves-pc-xbox-one', 'https://gamivo.com/product/fifa-21-origin-eng-pl-cz-tr')]

我怎样才能做到这一点?

标签: pythonregexpython-regex

解决方案


你需要

  1. 删除使//的匹配无效的初始字符,因为出现在之后https://http:///http
  2. 删除不必要的捕获组和{1}量词
  3. 将可选捕获组转换为非捕获组。

请参阅此 Python 演示

import re
frase = "text https://www.gamivo.com/product/sea-of-thieves-pc-xbox-one other text https://www.gamivo.com/product/fifa-21-origin-eng-pl-cz-tr"
print( re.findall(r"(?:https?://)?(?:www\.)?gamivo\.com\S*", frase) )
# => ['https://www.gamivo.com/product/sea-of-thieves-pc-xbox-one', 'https://www.gamivo.com/product/fifa-21-origin-eng-pl-cz-tr']

也请参阅正则表达式演示。另外,请参阅相关的re.findall 行为怪异的帖子。


推荐阅读