bash - 使用 UNIX 工具将带有注释的键值文件转换为 JSON 文档
问题描述
我在 YAML 的子集中有一个文件,其中包含如下数据:
# This is a comment
# This is another comment
spark:spark.ui.enabled: 'false'
spark:spark.sql.adaptive.enabled: 'true'
yarn:yarn.nodemanager.log.retain-seconds: '259200'
我需要将其转换为如下所示的 JSON 文档(请注意,包含布尔值和整数的字符串仍然是字符串):
{
"spark:spark.ui.enabled": "false",
"spark:spark.sql.adaptive.enabled": "true",
"yarn:yarn.nodemanager.log.retain-seconds", "259200"
}
我得到的最接近的是:
cat << EOF > ./file.yaml
> # This is a comment
> # This is another comment
>
>
> spark:spark.ui.enabled: 'false'
> spark:spark.sql.adaptive.enabled: 'true'
> yarn:yarn.nodemanager.log.retain-seconds: '259200'
> EOF
echo {$(cat file.yaml | grep -o '^[^#]*' | sed '/^$/d' | awk -F": " '{sub($1, "\"&\""); print}' | paste -sd "," - )}
除了看起来相当粗糙并没有给出正确答案之外,它还返回:
{"spark:spark.ui.enabled": 'false',"spark:spark.sql.adaptive.enabled": 'true',"dataproc:dataproc.monitoring.stackdriver.enable": 'true',"spark:spark.submit.deployMode": 'cluster'}
其中,如果我通过管道传输到jq
会导致解析错误。
我希望我错过了一种更简单的方法,但我无法弄清楚。任何人都可以帮忙吗?
解决方案
纯实现jq
(使用 1.6 版测试):
#!/usr/bin/env bash
jq_script=$(cat <<'EOF'
def content_for_line:
"^[[:space:]]*([#]|$)" as $ignore_re | # regex for comments, blank lines
"^(?<key>.*): (?<value>.*)$" as $content_re | # regex for actual k/v pairs
"^'(?<value>.*)'$" as $quoted_re | # regex for values in single quotes
if test($ignore_re) then {} else # empty lines add nothing to the data
if test($content_re) then ( # non-empty: match against $content_re
capture($content_re) as $content | # ...and put the groups into $content
$content.key as $key | # string before ": " becomes $key
(if ($content.value | test($quoted_re)) then # if value contains literal quotes...
($content.value | capture($quoted_re)).value # ...take string from inside quotes
else
$content.value # no quotes to strip
end) as $value | # result of the above block becomes $value
{"\($key)": "\($value)"} # and return a map from one key to one value
) else
# we get here if a line didn't match $ignore_re *or* $content_re
error("Line \(.) is not recognized as a comment, empty, or valid content")
end
end;
# iterate over our input lines, passing each one to content_for_line and merging the result
# into the object we're building, which we eventually return as our result.
reduce inputs as $item ({}; . + ($item | content_for_line))
EOF
)
# jq -R: read input as raw strings
# jq -n: don't read from stdin until requested with "input" or "inputs"
jq -Rn "$jq_script" <file.yaml >file.json
与不知道语法的工具不同,这永远不会生成不是有效 JSON 的输出;通过添加一个额外的过滤器阶段来检查和修改content_for_line
.
推荐阅读
- pentaho - Pentaho 数据集成:数据仓库维度:一项工作的转换
- stata - 多重插补数据的 Stata 相关 P 值
- java - 给定算法分析的“main”java.lang.StackOverflowError
- android - 如何限制谷歌地图的缩小?
- java - Selenium 与 java 脚本
- android - 禁用触发 ActivityforResult 的按钮
- java - 如何编写具有在 Java 中实现接口的参数的方法
- java - 错误 Gradle XML Android Studio
- python - 使用 self 时缺少 1 个必需的位置参数
- c# - 在剃刀视图中从 ActionResult 获取 html