regex - 在 Scala 中使用 Regex 解析字符串以创建对象
问题描述
我有一个字符串输入列表,我想使用正则表达式将其转换为对象列表。在下面的代码中,为了简单起见,我不是创建对象而是将它们打印出来stdout
。
我能够支持一些输入字符串,但不能支持整个列表。有人可以分享我做错了什么吗?
lazy val TIMESTAMP_PATTERN: Regex = """(year|month|day|hour)\(([a-zA-Z_]+)[,]?([a-zA-Z_]*)\)""".r
lazy val BUCKET_PATTERN: Regex = """(bucket)\((.+)(,)(.+)[,]?(.*)\)""".r
Seq(
"year(timestamp)",
"year(timestamp, _MY_YEAR)",
"month(timestamp)",
"month(timestamp, _MY_MONTH)",
"day(timestamp)",
"day(timestamp, _MY_DAY)",
"hour(timestamp)",
"hour(timestamp, _MY_HOUR)",
"bucket(id, 32)",
"bucket(id, 32, _MY_BUCKET)",
).foreach { input => input match {
case TIMESTAMP_PATTERN(transform, sourceColumn, targetColumn) => println(s"$transform ::: $sourceColumn :::- $targetColumn")
case BUCKET_PATTERN(sourceColumn, numBuckets) => println(s"bucket ::: $sourceColumn ::: $numBuckets")
case BUCKET_PATTERN(sourceColumn, numBuckets, targetColumn) => println(s"bucket ::: $sourceColumn ::: $numBuckets ::: $targetColumn")
case z => println(s"Unexpected match: $z")
}
}
输出
year ::: timestamp :::-
Unexpected match: year(timestamp, _MY_YEAR)
month ::: timestamp :::-
Unexpected match: month(timestamp, _MY_MONTH)
day ::: timestamp :::-
Unexpected match: day(timestamp, _MY_DAY)
hour ::: timestamp :::-
Unexpected match: hour(timestamp, _MY_HOUR)
Unexpected match: bucket(id, 32)
Unexpected match: bucket(id, 32, _MY_BUCKET)
解决方案
我在您的正则表达式和匹配中做了一些修复:
lazy val TIMESTAMP_PATTERN: Regex = """(year|month|day|hour)\((\w+)(?:,\s+)?(\w*)\)""".r
lazy val BUCKET_PATTERN: Regex = """bucket\((\w+),(?:\s+)?(\w+)(?:,\s+)?(\w*)\)""".r
Seq(
"year(timestamp)",
"year(timestamp, _MY_YEAR)",
"month(timestamp)",
"month(timestamp, _MY_MONTH)",
"day(timestamp)",
"day(timestamp, _MY_DAY)",
"hour(timestamp)",
"hour(timestamp, _MY_HOUR)",
"bucket(id, 32)",
"bucket(id, 32, _MY_BUCKET)",
).foreach {
case TIMESTAMP_PATTERN(transform, sourceColumn, "") => println(s"$transform ::: $sourceColumn")
case TIMESTAMP_PATTERN(transform, sourceColumn, targetColumn) => println(s"$transform ::: $sourceColumn :::- $targetColumn")
case BUCKET_PATTERN(sourceColumn, numBuckets, "") => println(s"bucket ::: $sourceColumn ::: $numBuckets")
case BUCKET_PATTERN(sourceColumn, numBuckets, targetColumn) => println(s"bucket ::: $sourceColumn ::: $numBuckets ::: $targetColumn")
case z => println(s"Unexpected match: $z")
}
现在的输出是:
year ::: timestamp
year ::: timestamp :::- _MY_YEAR
month ::: timestamp
month ::: timestamp :::- _MY_MONTH
day ::: timestamp
day ::: timestamp :::- _MY_DAY
hour ::: timestamp
hour ::: timestamp :::- _MY_HOUR
bucket ::: id ::: 32
bucket ::: id ::: 32 ::: _MY_BUCKET
以下是我所做的更改:
- 添加
?:
到由,
s 和空格组成的组中,以使这些组不被捕获。使用这种方法,空格仍然是可选的,但不会影响结尾匹配。 - 从存储桶中删除 (),因此它不是捕获组
- 由于最后一个匹配项是可选的并且可以为空,因此修改了具有较少项目的模式以匹配这种情况。请注意,如果最后一列不存在,则其捕获组将为空。
推荐阅读
- python - Python中的客户端/服务器
- reactjs - React - 使用自定义钩子调用另一个钩子 - 一种干净的方法?
- python - Knockpy 没有检测到 dnspython
- sparql - 在 Apache Jena Fuseki 存储库中获取三重统计或计数的高性能方法
- python - folium中的自定义填充颜色
- javascript - Farbic.js:有什么方法可以在 JSON.stringify(canvas) 生成的 Json 中保存 url 而不是 svg 图像的路径?
- python - Python:以pythonic方式将多个参数从函数传递给函数
- html - HorizontalScrollView 内的粘性 div
- python - Python中的字符串操作
- html - 是否有一个 HTML 元素属性可以提供该元素在文档树中的级别?