首页 > 解决方案 > Python正则表达式匹配关键字的所有变体,除非前面有大写单词

问题描述

我正在寻找一个 Python 正则表达式来匹配关键字的所有变体,除非前面有一个大写单词 - > 除非那个大写单词是句子的开头。也排除括号之间的单词。

例如:

keyword = 'public record'
string1 = 'Hello. His public records are available at city hall.' #match public records His is the start of a sentence so we ignore that it is capitalized and match
string2 = 'his records are at Newsom Public Record DataBase'      #nomatch
string3 = 'Public records may be available online'                #match Public records
string4 = '[public records](http:/....)'                          #nomatch

到目前为止,我已经尝试过:

pattern = f'(?<!\[)(?i)\\w*{keyword}\\w*'   #Doesn't  take into account preceding capitalized words
pattern = f'(?<![A-Z][\w-]\s)(?<!\[)(?i)\\w*{keyword}\\w*' #Doesn't work for cap words > 2 chara

标签: pythonregex

解决方案


您可以指定各种允许的开头,即句子开头 + 大写单词、非大写单词或字符串开头,然后断言关键字后跟前瞻:

pattern = r'(\. [A-Z]\w* |\W[^A-Z]\w* |^)(?=[pP]ublic [rR]ecord)'

推荐阅读