python - Regular expression exclude matches surrounded by quotation marks and lines starting with %
问题描述
I want to make a regular expression that is able to do the following:
- Match various words exactly, e.g. {
addpaths
,addpath
,test
} - Exclude lines that start with a
%
sign - Exclude matches that are surrounded by quotation marks (
'
and"
)
So I came up with the following regex (with flags g
, m
):
^[^%]*?(?<=[^\'\"])\b(addpaths|addpath|test)\b(?=[^\'\"]).*?$?
And this gives me the following result (see regex101):
function addpaths() --> match, correct
% function addpaths to add paths to path --> no match, correct
fprintf('running addpaths') --> no match, correct
fprintf('addpaths running') --> no match, correct
fprintf('running addpaths.') --> match, wrong
fprintf('running addpaths function') --> match, wrong
% fprintf('running addpaths') --> no match, correct
% fprintf('addpaths running') --> no match, correct
% fprintf('running addpaths function') --> no match, correct
% test what happens to 'test' --> no match, correct
run('test') --> no match, correct
'this is a test.' --> match, wrong
test --> match, correct
So the regex works when one of the exact matching words is next to a '
, but not when there is another word, whitespace or .
next to it. Why?
import re
text = '''function addpaths()
% function addpaths to add paths to path
fprintf('running addpaths')
fprintf('addpaths running')
fprintf('running addpaths function')
% fprintf('running addpaths')
% fprintf('addpaths running')
% fprintf('running addpaths function')
% test what happens to 'test'
run('test')
'this is a test.'
test
'''
pattern = '^[^%]*?(?<=[^\'\"])\\b(addpaths|addpath|test)\\b(?=[^\'\"]).*?$'
regex = re.compile(pattern, re.M)
matches = regex.findall(text)
for m in matches:
print(m)
解决方案
Try this:
import re
text = '''function addpaths()
% function addpaths to add paths to path
fprintf('running addpaths')
fprintf('addpaths running')
fprintf('running addpaths function')
% fprintf('running addpaths')
% fprintf('addpaths running')
% fprintf('running addpaths function')
% test what happens to 'test'
run('test')
'this is a test.'
test'''
pattern = r"""^(?!\s*%)[^'\"]+?\b(addpaths|addpath|test)\b(?!.*?['\"]).*?$"""
regex = re.compile(pattern, re.M)
for line in text.split('\n'):
print(line.ljust(50, ' '), regex.match(line) and 'OK' or 'NO MATCH')
OUPUT:
function addpaths() OK
% function addpaths to add paths to path NO MATCH
fprintf('running addpaths') NO MATCH
fprintf('addpaths running') NO MATCH
fprintf('running addpaths function') NO MATCH
NO MATCH
% fprintf('running addpaths') NO MATCH
% fprintf('addpaths running') NO MATCH
% fprintf('running addpaths function') NO MATCH
NO MATCH
% test what happens to 'test' NO MATCH
run('test') NO MATCH
'this is a test.' NO MATCH
test OK
I used negative lookahead
(?!.*?['\"])
because 'this is a test.'
after the word test
there is .
but in you regex
(addpaths|addpath|test)\b(?=[^\'\"])
you excluded the text that is followed directly by quotes. and this why this run('test')
didn't mach.
推荐阅读
- python - Django中的外键POST
- javascript - 使用 yarn start 或 npm start 无法在浏览器中打开 React 应用程序
- c++ - 检查哪个对象调用了对方的对象方法
- apache-flink - Kinesis Streams 和 Flink
- python - 如何以所有可能的组合在多个文件之间获取公共行/列 - python/pandas
- java - 数组排序 NullpointerException
- zebra-printers - ZPL 批量打印船标签问题...内存限制?
- windows - 如何使用 PowerShell(UTF-8 文件名)扩展 ZIP 存档
- java - 将 LinkedList 附加到另一个的本机 Java 方法?
- wordpress - 即使页面是子页面,Wordpress 也会向 url 添加 -1