ruby - Ruby:确定一行是否在正则表达式匹配的结果中
问题描述
我有一个相当复杂的正则表达式,它匹配位于 ASCII 分隔符之间的文档部分(例如 ==================)。我需要判断文档中的给定行是否是此正则表达式匹配的行之一。到目前为止,我的方法是存储通过将我的文档与正则表达式匹配返回的 MatchData,将其转换为数组并对其进行迭代以查找与给定行的匹配项。
between_separators = lambda { |ln, context|
body = context[:body]
rx = /^(([=\*_]){23,}\2{3}(?:\2|[\r\n])+)([\s\S]+?)\1/
matchdata = body.match(rx)
matched_lines = matchdata.to_a.map { |m| m.split("\n") }.flatten
matched_lines.each { |ml| return 1 if ml.match(ln) }
return 0
}
虽然这并不常见,但在某些情况下我最终会出现误报,因为我正在检查的行与实际上是第一次匹配结果中的其他行相同。
有没有更聪明的方法来解决这个问题?
编辑:
让我提供更多背景信息。
我得到一个纯文本文档,其中包含由“分隔符”行包围的文本块。我想检查从文档中提取的一行是否在这些分隔符之间。
这是我正在处理的一个例子:
This is some text that should not be matched. As you can see, it is not enclosed
by separator lines.
===========================================================
This part should be matched as it is between two separator lines. Note that the
opening and closing separators are composed of the exact same number of the same
character.
===========================================================
This block should not be matched as it is not enclosed by its own separators,
but rather the closing separator of the previous block and the opening
separator of the next block.
===========================================================
It is tricky to distinguish between an enclosed and non-enclosed blocks, because
sometimes a matching pair of separators appears to be legal, while it is really
the closing separator of the previous block and the opening separator of the
next one (e.g. the block obove this one).
===========================================================
==================================
=====
This block is enclosed by multiline separators.
==================================
=====
Some more text that should not be matched by the regex.
***************************************
A separator can use one of the following characters: '=' or '*' or '_'.
***************************************
***************************************
*******************
Another example of a multiline separated block.
***************************************
*******************
>Even more text not to be matchedby the regex. This time, preceeded by a
>variable number of '>'.
>>__________________________________________
>>And another type of separator. The block is now also a part of a reply section
>>of the email.
>>__________________________________________
我的目标是能够调用between_separators["This block is enclosed by multiline separators.", context]
并获得1
结果。虽然我提供的方法在大多数情况下都会成功,但它不可靠,我想改进它以免给出假阳性结果。
解决方案
推荐阅读
- asynchronous - 异步通道上的非阻塞接收?
- javascript - 如何在轮播的第一张和最后一张幻灯片之间添加平滑过渡?
- c# - Umbraco 8 后台登录与 IdentityServer4
- android - 我正在尝试在 android studio 中运行 AR 应用程序,但是在添加片段时出现错误“缺少约束和未知片段”
- gis - 从度数到小数和逆公式的计算
- ios - 忽略 SwiftUI Color 或 View 增加的配色方案对比度
- reactjs - 在窗口调整大小时反应 DOM 更改组件
- spring-boot - 做Service测试返回过滤列表,输入参数为id时怎么做?
- ios - 从容器类返回 DictionaryIterator 时,类型 DictionaryIterator 不符合协议“Sequence”
- python - cv2、python 和 spyder IDE