首页 > 解决方案 > 在 python 中使用正则表达式仅匹配未引用的单词

问题描述

在尝试处理一些代码时,我需要找到使用了某个列表中的变量的实例。问题是,代码被混淆了,这些变量名也可能出现在字符串中,例如,我不想匹配。

但是,我一直无法找到一个正则表达式来匹配只在 python 中工作的非引号单词......

标签: pythonregexpython-3.x

解决方案


"[^\\\\]((\")|('))(?(2)([^\"]|\\\")*|([^']|\\')*)[^\\\\]\\1|(\w+)"

应该将任何未引用的单词匹配到最后一组(第 6 组,索引 5 和基于 0 的索引)。需要进行少量修改以避免匹配以引号开头的字符串。

解释:

[^\\\\] Match any character but an escape character. Escaped quotes do not start a string.
((\")|(')) Immediately after the non-escaped character, match either " or ', which starts a string. This is group 1, which contains groups 2 (\") and 3 (')
(?(2) if we matched group 2 (a double-quote)
    ([^\"]|\\\")*| match anything but double quotes, or match escaped double quotes. Otherwise:
    ([^']|\\')*) match anything but a single quote or match an escaped single quote.
        If you wish to retrieve the string inside the quotes, you will have to add another group: (([^\"]|\\\")*) will allow you to retrieve the whole consumed string, rather than just the last matched character.
        Note that the last character of a quoted string will actually be consumed by the last [^\\\\]. To retrieve it, you have to turn it into a group: ([^\\\\]). Additionally, The first character before the quote will also be consumed by [^\\\\], which might be meaningful in cases such as r"Raw\text".
[^\\\\]\\1 will match any non-escape character followed by what the first group matched again. That is, if ((\")|(')) matched a double quote, we requite a double quote to end the string. Otherwise, it matched a single quote, which is what we require to end the string.
|(\w+) will match any word. This will only match if non-quoted strings, as quoted strings will be consumed by the previous regex.

例如:

import re
non_quoted_words = "[^\\\\]((\")|('))(?(2)([^\"]|\\\")*|([^']|\\')*)[^\\\\]\\1|(\w+)"
quote = "This \"is an example ' \\\" of \" some 'text \\\" like wtf' \\\" is what I said."
print(quote)
print(re.findall(non_quoted_words,quote))

将返回:

This "is an example ' \" of " some 'text \" like wtf' \" is what I said.
[('', '', '', '', '', 'This'), ('"', '"', '', 'f', '', ''), ('', '', '', '', '', 'some'), ("'", '', "'", '', 't', ''), ('', '', '', '', '', 'is'), ('', '', '', '', '', 'what'), ('', '', '', '', '', 'I'), ('', '', '', '', '', 'said')]

推荐阅读