regex - 如何使用正则表达式捕获包含一个或多个任意单词的多个组

问题描述

我有以下字符串：

Notable foos in bar: Baz Buzz Plaza (A), Quox Shopping Center (B), Fizzbuzz Industrial Park (C), Fee Fi Town Hall (D), Fo Fum Fire Department Station 1 Headquarters (E). Display their locations in a map.

我需要在 Ruby 中捕获以下正则表达式组：

Baz Buzz Plaza
Quox Shopping Center
Fizzbuzz Industrial Park
Fee Fi Town Hall
Fo Fum Fire Department Station 1 Headquarters

我似乎无法想出正确的模式。这是我尝试过的所有模式中最成功的一个：

/([\w|\s]+\(A\)|\(B\)|\(C\)|\(D\)|\(E\)|\(F\)|\(G\)[,|\.])+/

结果是：

Match 1
1.  Baz Buzz Plaza (A)
Match 2
1.  (B)
Match 3
1.  (C)
Match 4
1.  (D)
Match 5
1.  (E)

我很困惑为什么该模式只匹配我想要的第一个匹配组的文本，并且只返回其余组的括号中的字母。

在这一点上，我会满足于在每个匹配组末尾的括号中包含单个字母，因为我相信我可以在后续步骤中将它们删除。但理想的结果将是我提到的上述结果。

编辑-根据要求，我的捕获规则是我需要捕获之后的每个短语Notable foos in bar:，不包括括号中的单个字母、前面的空格或后面的标点符号。每个短语可以是单个单词或多个单词，并且短语中的每个单词可以是任意单词、代词或数字。Display their locations in a map.不需要捕获末尾的短语 ( )。

资料来源：Rubular.com

标签： regexruby

解决方案

我假设要提取的每个子字符串：

前面有': 'or '), '; 和
后跟' ('or '.'，句点位于字符串的末尾。

str = "Notable foos in bar: Baz Buzz Plaza (A), Quox Center (B), " +
      "Fizzbuzz Industrial Park (C), Fee Fi Town Hall (D), " +
      "Fo Fum Fire Department Station 1 Headquarters (E). " + 
      "Display their locations in a map (F), " +
      "I've added this string."

请注意，我修改了问题中给出的字符串，以在我认为应该提取的末尾添加一个子句。

str.scan /(?<=: |\), ).+?(?= \(|\.\z)/
  #=> ["Baz Buzz Plaza",
  #    "Quox Shopping Center",
  #    "Fizzbuzz Industrial Park",
  #    "Fee Fi Town Hall",
  #    "Fo Fum Fire Department Station 1 Headquarters",
  #    "I've added this string"]

我们可以以自由间距模式编写正则表达式以使其自记录：

r = /
    (?<=:\ |\),\ ) # match ': ' or '), ' in a positive lookbehind
    .+?            # match one or more characters lazily 
    (?=\ \(|\.\z)  # match ' (' or '.' at the end of the string 
                   # in a positive lookahead
    /x             # free-spacing regex definition mode

在自由间距模式下，必须保护空格，否则解析器将在计算表达式之前将其删除。我通过逃避他们做到了这一点。或者，可以将空格单独放置在原子组 ( [ ]) 中，或者可以使用 Unicode 表达式，例如[[:space:]].

regex - 如何使用正则表达式捕获包含一个或多个任意单词的多个组

问题描述

解决方案

推荐阅读