首页 > 解决方案 > Ruby:确定一行是否在正则表达式匹配的结果中

问题描述

我有一个相当复杂的正则表达式,它匹配位于 ASCII 分隔符之间的文档部分(例如 ==================)。我需要判断文档中的给定行是否是此正则表达式匹配的行之一。到目前为止,我的方法是存储通过将我的文档与正则表达式匹配返回的 MatchData,将其转换为数组并对其进行迭代以查找与给定行的匹配项。

    between_separators = lambda { |ln, context| 
        body = context[:body]
        rx = /^(([=\*_]){23,}\2{3}(?:\2|[\r\n])+)([\s\S]+?)\1/
        matchdata = body.match(rx)
        matched_lines = matchdata.to_a.map { |m| m.split("\n") }.flatten
        matched_lines.each { |ml| return 1 if ml.match(ln) }
        return 0
    }

虽然这并不常见,但在某些情况下我最终会出现误报,因为我正在检查的行与实际上是第一次匹配结果中的其他行相同。

有没有更聪明的方法来解决这个问题?

编辑:

让我提供更多背景信息。

我得到一个纯文本文档,其中包含由“分隔符”行包围的文本块。我想检查从文档中提取的一行是否在这些分隔符之间。

这是我正在处理的一个例子:

    This is some text that should not be matched. As you can see, it is not enclosed
by separator lines.

===========================================================
This part should be matched as it is between two separator lines. Note that the
opening and closing separators are composed of the exact same number of the same
character.
===========================================================
This block should not be matched as it is not enclosed by its own separators,
but rather the closing separator of the previous block and the opening 
separator of the next block.
===========================================================
It is tricky to distinguish between an enclosed and non-enclosed blocks, because
sometimes a matching pair of separators appears to be legal, while it is really
the closing separator of the previous block and the opening separator of the
next one (e.g. the block obove this one).
===========================================================
==================================
=====
This block is enclosed by multiline separators.
==================================
=====
Some more text that should not be matched by the regex.
***************************************



A separator can use one of the following characters: '=' or '*' or '_'.


***************************************
***************************************
*******************
Another example of a multiline separated block.
***************************************
*******************

>Even more text not to be matchedby the regex. This time, preceeded by a
>variable number of '>'.
>>__________________________________________
>>And another type of separator. The block is now also a part of a reply section
>>of the email.
>>__________________________________________

我的目标是能够调用between_separators["This block is enclosed by multiline separators.", context]并获得1结果。虽然我提供的方法在大多数情况下都会成功,但它不可靠,我想改进它以免给出假阳性结果。

标签: rubyregex

解决方案


推荐阅读