python - 在正则表达式中包含换行符和任何其他字符

问题描述

我目前在 Python 中使用 Jupyter Notebook 和 Regex 从 txt 格式的字典文件创建单词和定义字典。

来自文本文件的示例数据： ABACINATE\nA*bac"i*nate, v.t. Etym: [LL. abacinatus, p.p. of abacinare; ab off +\nbacinus a basin.]\n\nDefn: To blind by a red-hot metal plate held before the eyes. [R.]\n\nABACINATION\nA*bac`i*na"tion, n.\n\nDefn: The act of abacinating. [R.]\n\n

我试图创建的模式包括获取单词的所有大写字母，然后删除文本直到定义。

期望的输出

{'word': 'ABACINATE', 'definition': To blind by a red-hot metal plate held before the eyes.'}
{'word': 'ABACINATION', 'definition': The act of abacinating.'}

我已经尝试过的模式是

pattern="""
(?P<word>[A-Z*]{3,}) #retrieve capital letter word
(\n.*\n\n\Defn:) #ignore all text up until Defn:
(?P<definition>\w*) #retrieve any worded character after Defn:
(.\ ) #end at the full stop and space
"""
for item in re.finditer(pattern,all_words,re.VERBOSE):
    print(item.groupdict())

我正在努力处理这里的换行符。我试图隔离大写字母，然后立即从换行符开始并忽略任何字符，直到'Defn：'之前的两个换行符，并检索以句号结尾的定义。

有没有办法以这种方式处理换行符？

标签： pythonpython-3.xregexjupyter-notebook

您大多拥有它，只是缺少一个非贪婪匹配和定义中字符的扩展集。

import re
all_words = """ABACINATE\nA*bac"i*nate, v.t. Etym: [LL. abacinatus, p.p. of abacinare; ab off +\nbacinus a basin.]\n\nDefn: To blind by a red-hot metal plate held before the eyes. [R.]\n\nABACINATION\nA*bac`i*na"tion, n.\n\nDefn: The act of abacinating. [R.]\n\n"""

pattern="""
(?P<word>[A-Z*]{3,})([\s\S]*?Defn:)(?P<definition>[a-zA-Z -]*)
"""
for item in re.finditer(pattern,all_words,re.VERBOSE):
    print(item.groupdict())

{'word': 'ABACINATE', 'definition': '被眼前的炽热金属板致盲'} {'word': 'ABACINATION', 'definition': 'abacinating 的行为'}

python - 在正则表达式中包含换行符和任何其他字符 - Python

问题描述

解决方案

推荐阅读