首页 > 解决方案 > 从键值对中提取值并转换为 CSV

问题描述

我有以下数据集,

data:[{'name': 'cable',  'status': 'none'}, {'name': 'laptop', 'status': 'loaded', 'mode': 'high'}
{'name': 'samsung',  'status': 'none'}],       location:[{'place': 'chennai', 'distance': '100km'}, 
{'place': 'bangalore', 'distance': '200km'}]

我正在尝试提取值并将其转换为 CSV。在转换为其多维数组时遇到问题。任何建议都会有所帮助。

如果我的数据只是{'name': 'cable', 'status': 'none'}, {'name': 'laptop', 'status': 'loaded', 'mode': 'high'},我可以使用下面的 awk 来获取它,

awk -F " = " -v OFS="," '
    BEGIN { print "name","status","mode","place","distance" }
    function printline() {
        print data["name"], data["status"], data["mode"]
    }
    {data[$1] = $2}
    NF == 0 {printline(); delete data}
    END {printline()}
'

但是我无法用我的原始数据集得到它,

原始数据,

data:[{'name': 'cable',  'status': 'none'}, {'name': 'laptop', 'status': 'loaded', 'mode': 'high'}
{'name': 'samsung',  'status': 'none'}],       location:[{'place': 'chennai', 'distance': '100km'}, 
{'place': 'bangalore', 'distance': '200km'}]

预期结果,

name        status       mode        place       distance
cable       none         null        chennai     100km  
laptop      loaded       high        bangalore   200km 
samsung     none         null        null        null 

标签: unixawksedexport-to-csvkey-value

解决方案


以下是在所有 UNIX 机器上的任何 shell 中使用任何 awk 的分步方法开始:

$ cat tst.awk
{ rec = (NR>1 ? rec " " : "") $0 }
END {
    # Identify from rec:
    #   1) [{'name': 'cable',  'status': 'none'}, {'name': 'laptop', 'status': 'loaded', 'mode': 'high'} {'name': 'samsung',  'status': 'none'}]
    #   2) [{'place': 'chennai', 'distance': '100km'}, {'place': 'bangalore', 'distance': '200km'}]

    str = rec
    while ( match(str,/\[[^]]+/) ) {
        val = substr(str,RSTART+1,RLENGTH-1)
        level1vals[++numLevel1vals] = val
        str = substr(str,RSTART+RLENGTH)
    }

    for (level1valNr=1; level1valNr<=numLevel1vals; level1valNr++) {
        level1val = level1vals[level1valNr]

        # Identify from level1vals[1]:
        #   1) 'name': 'cable',  'status': 'none'
        #   2) 'name': 'laptop', 'status': 'loaded', 'mode': 'high'
        #   3) 'name': 'samsung',  'status': 'none'
        # and from level1vals[2]:
        #   4) 'place': 'chennai', 'distance': '100km'
        #   5) 'place': 'bangalore', 'distance': '200km'

        level2valNr = 0
        str = level1val
        while ( match(str,/{[^}]+/) ) {
            val = substr(str,RSTART+1,RLENGTH-1)
            ++level2valNr
            level2vals[level2valNr] = level2vals[level2valNr] " " val
            numLevel2vals = (level2valNr > numLevel2vals ? level2valNr : numLevel2vals)
            str = substr(str,RSTART+RLENGTH)
        }
    }

    # NOTE: delete these print loops when done testing/debugging
    for (level1valNr=1; level1valNr<=numLevel1vals; level1valNr++) {
        print "level1vals[" level1valNr "] = <" level1vals[level1valNr] ">"
    }
    print ""
    for (level2valNr=1; level2valNr<=numLevel2vals; level2valNr++) {
        print "level2vals[" level2valNr "] = <" level2vals[level2valNr] ">"
    }
}

.

$ awk -f tst.awk file
level1vals[1] = <{'name': 'cable',  'status': 'none'}, {'name': 'laptop', 'status': 'loaded', 'mode': 'high'} {'name': 'samsung',  'status': 'none'}>
level1vals[2] = <{'place': 'chennai', 'distance': '100km'},  {'place': 'bangalore', 'distance': '200km'}>

level2vals[1] = < 'name': 'cable',  'status': 'none' 'place': 'chennai', 'distance': '100km'>
level2vals[2] = < 'name': 'laptop', 'status': 'loaded', 'mode': 'high' 'place': 'bangalore', 'distance': '200km'>
level2vals[3] = < 'name': 'samsung',  'status': 'none'>

添加另一轮循环使用match($0,/\047[^\047]+/)来识别每个'foo'字符串,存储在一个数组中,然后以适当的顺序循环通过该最终数组以打印 CSV。


推荐阅读