首页 > 解决方案 > 在文件的第三部分中查找一个单词

问题描述

几年前,我在这里问了一个关于在文件的一个部分中查找特定单词的问题: RegEx - find a word inside a specific section of a file

现在,我想比那个问题更进一步,我正在尝试让 RegEx 为我生成匹配项。我正在尝试查看华盛顿县地区的预测,看看在第三个预测期内是否出现了严重的词。我在天气消息中使用它,我需要做的就是提供一个 RegEx 表达式......然后软件将根据该 RegEx 表达式评估传入的天气公告,如果有匹配,则处理公告(如果有不匹配,它会丢弃公告并继续)。

所以,我有以下公告(从原始版本中删减了一点):

FPUS55 KBOU 031108
ZFPBOU

Zone Forecast Product for Northeast Colorado
National Weather Service Denver/Boulder CO
508 AM MDT Mon Aug 3 2020

COZ048-040300-
Logan County-
including Crook, Merino, Sterling, and Peetz
508 AM MDT Mon Aug 3 2020

.TODAY...Mostly sunny. Isolated thunderstorms late in the
afternoon. Highs in the lower to mid 80s. Southeast winds 10 to
20 mph. Chance of thunderstorms 10 percent.
.TONIGHT...Partly cloudy in the evening then becoming mostly
cloudy. Isolated thunderstorms. Lows in the upper 50s. South
winds 10 to 20 mph. Chance of thunderstorms 20 percent.
.TUESDAY...Partly cloudy in the morning, then mostly cloudy with
a 40 percent chance of thunderstorms in the afternoon. Some
thunderstorms may be severe. Highs in the mid 80s to lower 90s.
Southeast winds 10 to 15 mph.
.TUESDAY NIGHT...Partly cloudy with a 40 percent chance of
thunderstorms. Some thunderstorms may be severe. Lows in the mid
50s.
.WEDNESDAY...Mostly sunny with a 30 percent chance of
thunderstorms. Highs in the mid 80s to lower 90s.
.WEDNESDAY NIGHT...Partly cloudy with a 30 percent chance of
thunderstorms. Lows in the upper 50s.
.THURSDAY...Mostly sunny with a 30 percent chance of
thunderstorms. Highs near 90.
.THURSDAY NIGHT...Partly cloudy with a 20 percent chance of
thunderstorms. Lows in the upper 50s.
.FRIDAY...Mostly sunny with a 10 percent chance of thunderstorms.
Highs in the lower to mid 90s.
.FRIDAY NIGHT...Partly cloudy with a 10 percent chance of
thunderstorms. Lows around 60.
.SATURDAY...Mostly sunny with a 10 percent chance of
thunderstorms. Highs in the lower 90s.
.SATURDAY NIGHT...Partly cloudy with a 10 percent chance of
thunderstorms. Lows around 60.
.SUNDAY...Mostly sunny with a 10 percent chance of thunderstorms.
Highs in the lower to mid 90s.

$$

COZ049-040300-
Washington County-
including Akron, Cope, Last Chance, and Otis
508 AM MDT Mon Aug 3 2020

.TODAY...Mostly sunny. Slight chance of thunderstorms early in
the morning. Isolated thunderstorms late in the afternoon. Highs
82 to 88. Southeast winds 10 to 20 mph. Chance of thunderstorms
20 percent.
.TONIGHT...Partly cloudy. Isolated thunderstorms in the evening.
Lows in the upper 50s. Southeast winds 10 to 20 mph. Chance of
thunderstorms 20 percent.
.TUESDAY...Partly cloudy. A 40 percent chance of thunderstorms in
the afternoon. Some thunderstorms may be severe. Highs near 90.
Southeast winds 10 to 15 mph with gusts to around 25 mph.
.TUESDAY NIGHT...Partly cloudy with a 50 percent chance of
thunderstorms. Some thunderstorms may be severe. Lows in the mid
50s.
.WEDNESDAY...Mostly sunny with a 30 percent chance of
thunderstorms. Highs near 90.
.WEDNESDAY NIGHT AND THURSDAY...Partly cloudy with a 30 percent
chance of thunderstorms. Lows in the upper 50s. Highs near 90.
.THURSDAY NIGHT...Partly cloudy with a 20 percent chance of
thunderstorms. Lows around 60.
.FRIDAY...Mostly sunny with a 10 percent chance of thunderstorms.
Highs in the lower to mid 90s.
.FRIDAY NIGHT...Partly cloudy with a 10 percent chance of
thunderstorms. Lows around 60.
.SATURDAY...Mostly sunny with a 10 percent chance of
thunderstorms. Highs in the mid 90s.
.SATURDAY NIGHT...Partly cloudy with a 10 percent chance of
thunderstorms. Lows around 60.
.SUNDAY...Mostly sunny. Highs in the lower to mid 90s.

$$

COZ046-040300-
North and Northeast Elbert County Below 6000 Feet/North Lincoln
County-
including Agate, Hugo, Limon, and Matheson
508 AM MDT Mon Aug 3 2020

.TODAY...Mostly sunny. Scattered thunderstorms late in the
afternoon. Highs in the lower to mid 80s. South winds 10 to
15 mph. Chance of thunderstorms 30 percent.
.TONIGHT...Mostly cloudy with scattered thunderstorms in the
evening, then partly cloudy after midnight. Lows in the 50s.
Southeast winds 10 to 15 mph. Chance of thunderstorms 30 percent.
.TUESDAY...Partly cloudy. A 40 percent chance of thunderstorms in
the afternoon. Some thunderstorms may be severe. Highs near 90.
Southeast winds 10 to 15 mph with gusts to around 25 mph.
.TUESDAY NIGHT...Partly cloudy with a 40 percent chance of
thunderstorms. Some thunderstorms may be severe. Lows in the 50s.
.WEDNESDAY...Mostly sunny with a 30 percent chance of
thunderstorms. Highs in the upper 80s. Southeast winds 10 to
15 mph.
.WEDNESDAY NIGHT AND THURSDAY...Partly cloudy with a 30 percent
chance of thunderstorms. Lows in the mid to upper 50s. Highs in
the upper 80s.
.THURSDAY NIGHT...Partly cloudy with a 20 percent chance of
thunderstorms. Lows in the mid 50s to lower 60s.
.FRIDAY...Mostly sunny with a 10 percent chance of thunderstorms.
Highs in the lower 90s.
.FRIDAY NIGHT...Partly cloudy with a 10 percent chance of
thunderstorms. Lows in the upper 50s.
.SATURDAY...Mostly sunny with a 10 percent chance of
thunderstorms. Highs in the lower 90s.
.SATURDAY NIGHT...Partly cloudy with a 10 percent chance of
thunderstorms. Lows in the upper 50s.
.SUNDAY...Mostly sunny. Highs in the lower 90s.

$$

在此特定示例中,华盛顿县位于其自己的区域中,它是本公告中的第二个区域。我在 Weather Message 中有一个现有规则,该规则在任何标记为“REST OF TODAY”的预测期内查找严重的单词:

Washington County((?!\n\$\$)[\s\S])+\n\.REST OF TODAY((?!\n\.)[\s\S])+severe

请注意,此表达式不会返回与上述公告的匹配项。 我有类似的规则,在标记为“今天”和“今晚”的预测期间内查看,格式相同。这对我来说效果很好,但我想开始展望第二天的预测(可能是第三个或第四个预测期)。不幸的是,这些都是按星期几标记的,或者如果明天恰好是假期,则按假期的名称进行标记。为了避免必须创建十五或二十条规则来捕获每一天或每一个假期,我想设计一个仅在(对于本示例)第三个预测期中查找的 RegEx 表达式。

事实证明,这个 RegEx 的中心部分对我来说很难。我已经尝试过类似的事情

Washington County((?!\n\$\$)[\s\S])+(\n\.){3}((?!\n\.)[\s\S])+severe

Washington County((?!\n\$\$)[\s\S])+(\.\.\.){3}((?!\n\.)[\s\S])+severe

关闭换行符,后跟表示每个预测期开始的句点,或者将每个预测期与其预测分开的省略号,但这些都不会产生匹配。

RegEx 非常灵活,所以我认为必须有一种方法来实现这一点,但到目前为止我还无法弄清楚。社区可以提供的任何帮助将不胜感激。

标签: regex

解决方案


如果您想获得第三个预测期,我假设TUESDAYWashington County\.[A-Z]+(?: [A-Z]+)?\.{3}

然后,您可以匹配所有不以预测模式或$$使用负前瞻开头的行,(?!并重复使用 2 次{2}以到达第三部分。

然后,在您要查找严重的第三部分,它可以在开始预测模式的同一第一行中,也可以在任何后续行中。您可以选择匹配这些,然后在包含它的行中匹配严重。

这有点冗长,因为为了防止在下一个预测期或下一部分匹配单词,您必须检查这些行是否不包含您不想要的值。

^Washington County.*(?:\r?\n(?!\.[A-Z]+(?: [A-Z]+)?\.{3}|\$\$).*)*(?:\r?\n\.[A-Z]+(?: [A-Z]+)?\.\.\..*(?:\r?\n(?!\.[A-Z]+(?: [A-Z]+)?\.{3}|\$\$).*)*){2}\r?\n\.[A-Z]+(?: [A-Z]+)?\.{3}(?:(?!.*\bsevere\b).*(?:\r?\n(?!\.[A-Z]+(?: [A-Z]+)?\.{3}|\$\$).*)*\r?\n(?!\.[A-Z]+(?: [A-Z]+)?\.{3}|\$\$))?.*\b(severe)\b

正则表达式演示


推荐阅读