首页 > 解决方案 > DataWeave 2.0 匹配所有出现的正则表达式

问题描述

我想捕获与特定正则表达式匹配的字符串中的所有匹配项。我正在使用 DataWeave 2.0(这意味着 Mule Runtime 4.3,在我的情况下是 Anypoint Studio 7.5)

我尝试使用 DataWeave 核心库中的 scan() 和 match(),但无法完全得到我想要的结果。

这是我尝试过的一些事情:

%dw 2.0
output application/json

// sample input with hashtag keywords
var microList = 'Someone is giving away millions. See @realmcsrooge at #downtownmalls now!
#shoplocal and tell them #giveaway @barry sent you. #downtowndancehalls'
---
{
    withscan: microList scan /(#[^\s]*).*/,
    sanitized: microList replace /\n/ 
        with ' ',
    sani_match: microList replace /\n/ 
        with ' ' match /.*(#[^\s]*).*/, // gives full string and last match
    sani_scan: microList replace /\n/ 
        with ' ' scan /.*(#[^\s]*).*/   // gives array of arrays, string and last match
}

以下是各自的结果:

{
  "withscan": [
    [
      "#downtownmalls now!",
      "#downtownmalls"
    ],
    [
      "#shoplocal and tell them #giveaway @barry sent you. #downtowndancehalls",
      "#shoplocal"
    ]
  ],
  "sanitized": "Someone is giving away millions. See @realmcsrooge at #downtownmalls now! #shoplocal and tell them #giveaway @barry sent you. #downtowndancehalls",
  "sani_match": [
    "Someone is giving away millions. See @realmcsrooge at #downtownmalls now! #shoplocal and tell them #giveaway @barry sent you. #downtowndancehalls",
    "#downtowndancehalls"
  ],
  "sani_scan": [
    [
      "Someone is giving away millions. See @realmcsrooge at #downtownmalls now! #shoplocal and tell them #giveaway @barry sent you. #downtowndancehalls",
      "#downtowndancehalls"
    ]
  ]
}

在第一个示例中,解析器似乎正在执行行处理。因此,每一行的结果数组中都有一个元素。一个元素由完全匹配的部分和使用模式的第一次出现的标记部分组成。

去除换行符后,第三个示例 (sani_match) 给了我一个包含完全匹配部分和标记部分的数组,这一次是该行上模式的最后一次出现。

最终模式 (sani_scan) 给出了类似的结果,唯一的区别是结果作为元素嵌入到数组数组中。

我想要的只是一个包含所有出现指定模式的数组。

标签: regexdataweavemulesoft

解决方案


如果您想捕获字符串中与特定正则表达式匹配的所有匹配项,我发现神奇的词是“重叠匹配”。

如果您真正想要的是从字符串中获取散列标签,只需使用 Valdi_Bo 解决方案

要在 Java 中启用单行标志,您需要(?s)在开头添加。

脚本:

%dw 2.0
output application/json

var str = 'Someone is giving away millions. See @realmcsrooge at #downtownmalls now!
#shoplocal and tell them #giveaway @barry sent you. #downtowndancehalls'
---
{
    // (?s) is the single-line modifier
    // (?=(X)). enable overlapping matches
    matchUntilEnd: str scan(/(?s)(?=(#([^\s]*).*))./) map $[1],
    justTags: str scan(/(?s)#([^\s]*)/) map $[1],
    Valdi_BoSolutionWithGroups: str scan(/#([\S]+)/) map $[1]
}

输出:

{
  "matchUntilEnd": [
    "#downtownmalls now!\n#shoplocal and tell them #giveaway @barry sent you. #downtowndancehalls",
    "#shoplocal and tell them #giveaway @barry sent you. #downtowndancehalls",
    "#giveaway @barry sent you. #downtowndancehalls",
    "#downtowndancehalls"
  ],
  "justTags": [
    "downtownmalls",
    "shoplocal",
    "giveaway",
    "downtowndancehalls"
  ],
  "Valdi_BoSolutionWithGroups": [
    "downtownmalls",
    "shoplocal",
    "giveaway",
    "downtowndancehalls"
  ]
}

推荐阅读