首页 > 解决方案 > 用于捕获科学引文的 RegEx

问题描述

我正在尝试捕获其中至少包含一位数字的文本括号(想想引文)。这是我现在的正则表达式,它工作正常:https ://regex101.com/r/oOHPvO/5

\((?=.*\d).+?\)

所以我希望它能够捕获(Author 2000)(2000)但不是(Author)

我正在尝试使用 python 来捕获所有这些括号,但在 python 中,即使它们没有数字,它也会捕获括号中的文本。

import re

with open('text.txt') as f:
    f = f.read()

s = "\((?=.*\d).*?\)"

citations = re.findall(s, f)

citations = list(set(citations))

for c in citations:
    print (c)

任何想法我做错了什么?

标签: pythonregexre

解决方案


可能处理此表达式的最可靠方法可能是在您的表达式可能增长时添加边界。例如,我们可以尝试创建 char 列表,我们希望在其中收集所需的数据:

(?=\().([a-z]+)([\s,;]+?)([0-9]+)(?=\)).

演示

测试

# coding=utf8
# the above tag defines encoding for this document and is for Python 2.x compatibility

import re

regex = r"(?=\().([a-z]+)([\s,;]+?)([0-9]+)(?=\))."

test_str = "some text we wish before (Author) some text we wish after (Author 2000) some text we wish before (Author) some text we wish after (Author, 2000) some text we wish before (Author) some text we wish after (Author 2000) some text we wish before (Author) some text we wish after (Author; 2000)"

matches = re.finditer(regex, test_str, re.MULTILINE | re.IGNORECASE)

for matchNum, match in enumerate(matches, start=1):

    print ("Match {matchNum} was found at {start}-{end}: {match}".format(matchNum = matchNum, start = match.start(), end = match.end(), match = match.group()))

    for groupNum in range(0, len(match.groups())):
        groupNum = groupNum + 1

        print ("Group {groupNum} found at {start}-{end}: {group}".format(groupNum = groupNum, start = match.start(groupNum), end = match.end(groupNum), group = match.group(groupNum)))

# Note: for Python 2.7 compatibility, use ur"" to prefix the regex and u"" to prefix the test string and substitution.

演示

const regex = /(?=\().([a-z]+)([\s,;]+?)([0-9]+)(?=\))./mgi;
const str = `some text we wish before (Author) some text we wish after (Author 2000) some text we wish before (Author) some text we wish after (Author, 2000) some text we wish before (Author) some text we wish after (Author 2000) some text we wish before (Author) some text we wish after (Author; 2000)`;
let m;

while ((m = regex.exec(str)) !== null) {
    // This is necessary to avoid infinite loops with zero-width matches
    if (m.index === regex.lastIndex) {
        regex.lastIndex++;
    }
    
    // The result can be accessed through the `m`-variable.
    m.forEach((match, groupIndex) => {
        console.log(`Found match, group ${groupIndex}: ${match}`);
    });
}

正则表达式电路

jex.im可视化正则表达式:

在此处输入图像描述


推荐阅读