json - JSON 不会在 Unix 中用 jq 转换
问题描述
转换此 JSON 时遇到困难。它是多行的,类似于下面的内容。底部的示例数据是解压缩后按原样读取的数据。
已尝试的示例:
jq -r '(([["user_id","server_received_time","app","device_carrier","$schema","city","uuid","event_time","platform","os_version","amplitude_id","processed_time","user_creation_time","version_name","ip_address","paying","dma","group_properties","user_properties","client_upload_time","$insert_id","event_type","library","amplitude_attribution_ids","device_type","device_manufacturer","start_version","location_lng","server_upload_time","event_id","location_lat","os_name","amplitude_event_type","device_brand","groups","event_properties","data","device_id","language","device_model","country","region","is_attribution_event","adid","session_id","device_family","sample_rate","idfa","client_event_time"]]) + [(.table.All[] | [.user_id,.server_received_time,.app,.device_carrier,.$schema,.city,.uuid,.event_time,.platform,.os_version,.amplitude_id,.processed_time,.user_creation_time,.version_name,.ip_address,.paying,.dma,.group_properties,.user_properties,.client_upload_time,.$insert_id,.event_type,.library,.amplitude_attribution_ids,.device_type,.device_manufacturer,.start_version,.location_lng,.server_upload_time,.event_id,.location_lat,.os_name,.amplitude_event_type,.device_brand,.groups,.event_properties,.data,.device_id,.language,.device_model,.country,.region,.is_attribution_event,.adid,.session_id,.device_family,.sample_rate,.idfa,.client_event_time])])[]|@csv' test.json > test.csv
以及其他一些 jq 选项。无论值如何,我都需要每一列,并且值原样。有人对我们为什么会遇到问题有想法吗?我们得到的一个错误是:
jq: error: try .["field"] instead of .field for unusually named fields at <top-level>, line 1:
其他 jq 行给出了以下错误:
string (...) cannot be csv-formatted, only array
这是其中一个 JSON 文件的摘录:
{"groups":{},"country":"United States","device_id":"3d-88c-45-b6-ed81277eR","is_attribution_event":false,"server_received_time":"2019-12-17 17:29:11.113000","language":"English","event_time":"2019-12-17 17:27:49.047000","user_creation_time":"2019-11-08 13:15:32.919000","city":"Sure","uuid":"someID","device_model":"Windows","amplitude_event_type":null,"client_upload_time":"2019-12-17 17:29:21.958000","data":{},"library":"amplitude-js\/5.2.2","device_manufacturer":null,"dma":"Washington, DC (Townville, USA)","version_name":null,"region":"Virginia","group_properties":{},"location_lng":null,"device_family":"Windows","paying":null,"client_event_time":"2019-12-17 17:27:59.892000","$schema":12,"device_brand":null,"user_id":"email@gmail.com","event_properties":{"title":"Name","id":"1-253251","applicationName":"SomeName"},"os_version":"18","device_carrier":null,"server_upload_time":"2019-12-17 17:29:11.135000","session_id":1576603675620,"app":231165,"amplitude_attribution_ids":null,"event_type":"CHANGE_PERSPECTIVE","user_properties":{},"adid":null,"device_type":"Windows","$insert_id":"e308c923-d8eb-48c6-8ea5-600","event_id":24,"amplitude_id":515,"processed_time":"2019-12-17 17:29:12.760372","platform":"Web","idfa":null,"os_name":"Edge","location_lat":null,"ip_address":"123.456.78.90","sample_rate":null,"start_version":null}
谢谢!
解决方案
您的尝试有几个问题。
首先,名称中带有“$”的键不能使用缩写.foo
语法指定;你可以.["$foo"]
改用。
其次,@csv
需要一个原子值数组。因此,必须特别处理以 JSON 对象为值的键。
第三,“+”不正确。这里的相关连接符是“,”。
使用您的示例 JSON,以下内容将起作用:
(["user_id","server_received_time","app","device_carrier","$schema","city","uuid","event_time","platform","os_version","amplitude_id","processed_time","user_creation_time","version_name","ip_address","paying","dma","group_properties","user_properties","client_upload_time","$insert_id","event_type","library","amplitude_attribution_ids","device_type","device_manufacturer","start_version","location_lng","server_upload_time","event_id","location_lat","os_name","amplitude_event_type","device_brand","groups","event_properties","data","device_id","language","device_model","country","region","is_attribution_event","adid","session_id","device_family","sample_rate","idfa","client_event_time"]),
([.user_id,.server_received_time,.app,.device_carrier,.["$schema"],.city,.uuid,.event_time,.platform,.os_version,.amplitude_id,.processed_time,.user_creation_time,.version_name,.ip_address,.paying,.dma,.group_properties,.user_properties,.client_upload_time,.["$insert_id"],.event_type,.library,.amplitude_attribution_ids,.device_type,.device_manufacturer,.start_version,.location_lng,.server_upload_time,.event_id,.location_lat,.os_name,.amplitude_event_type,.device_brand,.groups,.event_properties,.data,.device_id,.language,.device_model,.country,.region,.is_attribution_event,.adid,.session_id,.device_family,.sample_rate,.idfa,.client_event_time]
| map(if type=="object"
then to_entries
| map( "\(.key):\(.value)" )
| join(";")
else . end))
| @csv
不易出错的解决方案
指定长长的键列表两次会使上述解决方案容易出错。最好只指定一次键,然后以编程方式生成行。
这是一个可用于此目的的实用程序函数:
def toa($headers):
. as $in | $headers | map($in[.]);
或者你可以处理里面的对象值键toa
:
def toa($headers):
def flat:
if type == "object" or type == "array"
then to_entries | map( "\(.key):\(.value)" ) | join(";")
else .
end;
. as $in | $headers | map($in[.] | flat);
JSONL
如果输入是问题中所示类型的 JSON 对象流,则有效的解决方案将inputs
与 -n 命令行选项一起使用。这可能是这样的:
print_header,
(inputs | print_row)
推荐阅读
- excel - 从 VBA 中的多个表中清除数据
- jenkins - 从 Linux 在 IIS 上发布 NetCore 2.2
- javascript - location.reload(true); 不在ie11中
- google-apps-script - 如何创建可以创建新 Google Doc 的 Google Web App,然后在浏览器选项卡中打开它?
- tensorflow - 如何在 tensorflow.js 模型中添加图像并针对给定的图像标签训练模型
- terminal - 启动终端时applescript超时
- sql - 如何检查连接表中的完全匹配
- google-analytics - 谷歌分析点击呼叫事件到 AMP
- css - 更改primeng日历中时间选择器的样式
- api - 使用 ECDSA 签署 api 请求