regex - 跳过包含在我正在寻找的模式中的正则表达式模式
问题描述
我正在解析包含以 开头^[
和结尾的脚注的 Pandoc-markdown 文件,]
其中一些包含嵌入的[]
. 例如:
...
to explain how the feature came to be as it is, so you can use generics more
effectively.^[Angelika Langer's [Java Generics FAQ](
www.angelikalanger.com/GenericsFAQ/JavaGenericsFAQ.html) as well as her other
writings (together with Klaus Kreft) were invaluable during the preparation of
this chapter.]
...
(在 Python 中)的简单方法:
re.compile(r"\^\[.+?\]", flags=re.DOTALL)
一开始就停止,]
因此没有捕获整个脚注。有没有办法通过嵌套[]
子句?
解决方案
您可以使用 PyPi 正则表达式模块使用子程序来做到这一点,您只需要在设置组边界时小心:
import regex
text = r"""...
to explain how the feature came to be as it is, so you can use generics more
effectively.^[Angelika Langer's [Java Generics FAQ](
www.angelikalanger.com/GenericsFAQ/JavaGenericsFAQ.html) as well as her other
writings (together with Klaus Kreft) were invaluable during the preparation of
this chapter.]
..."""
print( [x.group(1) for x in regex.finditer(r'\^(\[(?:[^][]++|(?1))*])', text)] )
输出:
["[Angelika Langer's [Java Generics FAQ](\nwww.angelikalanger.com/GenericsFAQ/JavaGenericsFAQ.html) as well as her other\nwritings (together with Klaus Kreft) were invaluable during the preparation of\nthis chapter.]"]
\^
-^
字符(\[(?:[^][]++|(?1))*])
- 第 1 组:\[
- 一个[
字符(?:[^][]++|(?1))*
- 0 次或多次出现:[^][]++
]
- 除了and之外的一个或多个字符[
|
- 或者(?1)
- 第 1 组模式
]
- 一个]
字符。
推荐阅读
- javascript - 在车把页面中打印来自 JSON 的过滤数据
- c++ - 在 MSVC/C++ 中使用哪种模板类型:size_t 或 int
- c# - 尝试创建类 Toast 通知,不能多次实例化
- java - 无法执行同步线程
- json - java.lang.String 类型 id 的值 [YoutubePlaylistID] 无法转换为 JSONObject
- glsl - WebGL vec4() 描述
- reactjs - 地图中的 Redux-Saga call() 和 put()
- quarkus - Quarkus - 修改 JSESSIONID 属性
- python - 如何解释 model.predict 返回的结果?
- c - 在不阻塞当前执行的情况下从 ac 程序启动和停止 linux shell 命令