首页 > 解决方案 > Python regex ignoring pattern

问题描述

I have a list of two keywords like below:

keywords = ["Azure", "Azure cloud"]

but python unable to find the second keyword "Azure cloud"

>>> keywords = ["Azure", "Azure cloud"]
>>> r = re.compile('|'.join([re.escape(w) for w in keywords]), flags=re.I)
>>> word = "Azure and Azure cloud"
>>> r.findall(word)
['Azure', 'Azure']

I am expecting the output like this : ['Azure', 'Azure', 'Azure cloud']

Any guide/help would be highly appreciated!

标签: pythonregexpattern-matching

解决方案


You can run multiple search.

import itertools
import re

keywords = ["Azure", "Azure cloud"]
patterns = [re.compile(re.escape(w), flags=re.I) for w in keywords]
word = "Azure and Azure cloud"
results = list(itertools.chain.from_iterable(
    r.findall(word) for r in patterns
))
print(results)

output:

['Azure', 'Azure', 'Azure cloud']

Append

if I'd have word = "Azure and azure cloud" - I will have the output as ['Azure', 'azure', 'azure cloud'] - so the 2nd keyword "azure" which is in small, if i would have to get the exact word matching with the "keywords" list which is "Azure", what modification has to be made in the code?

The flag re.I means ignore-case. So simply remove this.

patterns = [re.compile(re.escape(w)) for w in keywords]

Append 2

sorry my last comment was vague, so I want the pattern matching to ignore the case, but while fetching the output I would want the keywords to have exact case what we have in the "keyword" list and not in the "word"

Sorry for misunderstanding. Try this:

import re

keywords = ["Azure", "azure cloud"]
patterns = [re.compile(w, flags=re.I) for w in keywords]
word = "Azure and azure cloud"
results = [
    match_obj.re.pattern
    for r in patterns
    for match_obj in r.finditer(word)
]
print(results)

output:

['Azure', 'Azure', 'azure cloud']

I'm not sure it is effecient way, but it works.
Note that I remove re.escape because it cause space escape so result was:

['Azure', 'Azure', 'azure\\ cloud']

推荐阅读