regex - DataWeave 2.0 匹配所有出现的正则表达式
问题描述
我想捕获与特定正则表达式匹配的字符串中的所有匹配项。我正在使用 DataWeave 2.0(这意味着 Mule Runtime 4.3,在我的情况下是 Anypoint Studio 7.5)
我尝试使用 DataWeave 核心库中的 scan() 和 match(),但无法完全得到我想要的结果。
这是我尝试过的一些事情:
%dw 2.0
output application/json
// sample input with hashtag keywords
var microList = 'Someone is giving away millions. See @realmcsrooge at #downtownmalls now!
#shoplocal and tell them #giveaway @barry sent you. #downtowndancehalls'
---
{
withscan: microList scan /(#[^\s]*).*/,
sanitized: microList replace /\n/
with ' ',
sani_match: microList replace /\n/
with ' ' match /.*(#[^\s]*).*/, // gives full string and last match
sani_scan: microList replace /\n/
with ' ' scan /.*(#[^\s]*).*/ // gives array of arrays, string and last match
}
以下是各自的结果:
{
"withscan": [
[
"#downtownmalls now!",
"#downtownmalls"
],
[
"#shoplocal and tell them #giveaway @barry sent you. #downtowndancehalls",
"#shoplocal"
]
],
"sanitized": "Someone is giving away millions. See @realmcsrooge at #downtownmalls now! #shoplocal and tell them #giveaway @barry sent you. #downtowndancehalls",
"sani_match": [
"Someone is giving away millions. See @realmcsrooge at #downtownmalls now! #shoplocal and tell them #giveaway @barry sent you. #downtowndancehalls",
"#downtowndancehalls"
],
"sani_scan": [
[
"Someone is giving away millions. See @realmcsrooge at #downtownmalls now! #shoplocal and tell them #giveaway @barry sent you. #downtowndancehalls",
"#downtowndancehalls"
]
]
}
在第一个示例中,解析器似乎正在执行行处理。因此,每一行的结果数组中都有一个元素。一个元素由完全匹配的部分和使用模式的第一次出现的标记部分组成。
去除换行符后,第三个示例 (sani_match) 给了我一个包含完全匹配部分和标记部分的数组,这一次是该行上模式的最后一次出现。
最终模式 (sani_scan) 给出了类似的结果,唯一的区别是结果作为元素嵌入到数组数组中。
我想要的只是一个包含所有出现指定模式的数组。
解决方案
如果您想捕获字符串中与特定正则表达式匹配的所有匹配项,我发现神奇的词是“重叠匹配”。
如果您真正想要的是从字符串中获取散列标签,只需使用 Valdi_Bo 解决方案
要在 Java 中启用单行标志,您需要(?s)
在开头添加。
脚本:
%dw 2.0
output application/json
var str = 'Someone is giving away millions. See @realmcsrooge at #downtownmalls now!
#shoplocal and tell them #giveaway @barry sent you. #downtowndancehalls'
---
{
// (?s) is the single-line modifier
// (?=(X)). enable overlapping matches
matchUntilEnd: str scan(/(?s)(?=(#([^\s]*).*))./) map $[1],
justTags: str scan(/(?s)#([^\s]*)/) map $[1],
Valdi_BoSolutionWithGroups: str scan(/#([\S]+)/) map $[1]
}
输出:
{
"matchUntilEnd": [
"#downtownmalls now!\n#shoplocal and tell them #giveaway @barry sent you. #downtowndancehalls",
"#shoplocal and tell them #giveaway @barry sent you. #downtowndancehalls",
"#giveaway @barry sent you. #downtowndancehalls",
"#downtowndancehalls"
],
"justTags": [
"downtownmalls",
"shoplocal",
"giveaway",
"downtowndancehalls"
],
"Valdi_BoSolutionWithGroups": [
"downtownmalls",
"shoplocal",
"giveaway",
"downtowndancehalls"
]
}
推荐阅读
- python - Python,通过提取字符和数字子字符串来解析字符串
- python - Tkinter 条目小部件 - 键入的值被复制到其他条目
- php - Laravel Mix 和 alpine 3.14 权限错误
- editing - Filmora 视频被调整为更小
- c++ - 在 OpenCV 中从 V4L2 捕获视频时出错
- ios - Swift - 共享时如何删除 UIActivityViewController 的顶部预览部分?
- reactjs - 将道具传递给反应js中的另一个组件
- android-studio - 如何在 androidOs 中找到 HarmonyOS 的替代类/方法?
- kubernetes - 无法通过 RKE https://rancher-webhook.cattle-system.svc:443/v1/webhook/validation?timeout=10s 添加集群:超出上下文期限
- python - 如何在给定条件下使 tkinter 按钮可点击或不可点击