首页 > 解决方案 > 使用 UNIX 工具将带有注释的键值文件转换为 JSON 文档

问题描述

我在 YAML 的子集中有一个文件,其中包含如下数据:

# This is a comment
# This is another comment


spark:spark.ui.enabled: 'false'
spark:spark.sql.adaptive.enabled: 'true'
yarn:yarn.nodemanager.log.retain-seconds: '259200'


我需要将其转换为如下所示的 JSON 文档(请注意,包含布尔值和整数的字符串仍然是字符串):

{
  "spark:spark.ui.enabled": "false",
  "spark:spark.sql.adaptive.enabled": "true",
  "yarn:yarn.nodemanager.log.retain-seconds", "259200"
}

我得到的最接近的是:

cat << EOF > ./file.yaml
> # This is a comment
> # This is another comment
> 
> 
> spark:spark.ui.enabled: 'false'
> spark:spark.sql.adaptive.enabled: 'true'
> yarn:yarn.nodemanager.log.retain-seconds: '259200'
> EOF
echo {$(cat file.yaml | grep -o '^[^#]*' | sed '/^$/d' | awk -F": " '{sub($1, "\"&\""); print}' | paste -sd "," -  )}

除了看起来相当粗糙并没有给出正确答案之外,它还返回:

{"spark:spark.ui.enabled": 'false',"spark:spark.sql.adaptive.enabled": 'true',"dataproc:dataproc.monitoring.stackdriver.enable": 'true',"spark:spark.submit.deployMode": 'cluster'}

其中,如果我通过管道传输到jq会导致解析错误。

我希望我错过了一种更简单的方法,但我无法弄清楚。任何人都可以帮忙吗?

标签: bashshellawkjq

解决方案


纯实现jq(使用 1.6 版测试):

#!/usr/bin/env bash

jq_script=$(cat <<'EOF'
def content_for_line:
  "^[[:space:]]*([#]|$)" as $ignore_re |           # regex for comments, blank lines
  "^(?<key>.*): (?<value>.*)$" as $content_re |    # regex for actual k/v pairs
  "^'(?<value>.*)'$" as $quoted_re |               # regex for values in single quotes
  if test($ignore_re) then {} else                 # empty lines add nothing to the data
    if test($content_re) then (                    # non-empty: match against $content_re
      capture($content_re) as $content |           # ...and put the groups into $content
      $content.key as $key |                       # string before ": " becomes $key
      (if ($content.value | test($quoted_re)) then # if value contains literal quotes...
         ($content.value | capture($quoted_re)).value # ...take string from inside quotes
       else
         $content.value                               # no quotes to strip
       end) as $value |                     # result of the above block becomes $value
      {"\($key)": "\($value)"}              # and return a map from one key to one value
    ) else
      # we get here if a line didn't match $ignore_re *or* $content_re
      error("Line \(.) is not recognized as a comment, empty, or valid content")
    end
  end;

# iterate over our input lines, passing each one to content_for_line and merging the result
# into the object we're building, which we eventually return as our result.
reduce inputs as $item ({}; . + ($item | content_for_line))
EOF
)

# jq -R: read input as raw strings
# jq -n: don't read from stdin until requested with "input" or "inputs"
jq -Rn "$jq_script" <file.yaml >file.json

与不知道语法的工具不同,这永远不会生成不是有效 JSON 的输出;通过添加一个额外的过滤器阶段来检查和修改content_for_line.


推荐阅读