jq - 将条目视为主键打印一次,打印关联的条目数组,作为 CSV,删除空
问题描述
我有这样的记录,有时有重复的srcPath
条目,尽管不同的references
.
例如/content/dam/foo/about-bar/photos/rayDavis.PNG
在一条记录中出现 3 次,不同的references
.
我想srcPath
打印一次唯一的,以及相关的references
.
我也有空记录,
{
"pages": []
}
我不想看到那些。
我真的很想要一个csv:
srcPath
,也许是不同的字段,例如published
,以及 first reference
, second reference
,thirdreference
等——关联references
数组作为同一行上的连续逗号分隔值,例如:
"/content/dam/foo/about-bar/pdf/theplan.pdf", true, "/content/foo/en/about-bar/the-plan-and-vision/jcr:content/content2/image/link", "/content/foo/en/about-bar/the-plan-and-vision/jcr:content/content2/textboximg/boxFtr", "/content/foo/en/about-bar/the-plan-and-vision/jcr:content/content1/textboximg/text"
"/content/dam/foo/about-bar/photos/rayDavis.PNG", true, "/content/foo/en/about-bar/jcr:content/content1B/promos_1/image/fileReference", "/content/foo/en/about-bar/monkey-development/tales-of-giving/ray-moose-davis/jcr:content/content1/textboximg/fileReference", "/content/foo/en/about-bar/monkey-development/tales-of-giving/jcr:content/content1/textboximg_2/fileReference"
"/content/dam/foo/about-bar/pdf/foo_19thNewsletter.pdf", true, "/content/foo/en/gremlins/stay-tuned/jcr:content/content3/textboximg/text"
"/content/dam/foo/about-bar/pdf/barNews_fall1617.pdf", true, "/content/foo/en/gremlins/jcr:content/content2C/textboximg_114671747/text", "/content/dam/foo/about-bar/pdf/barNews_fall1617.pdf", "/content/foo/en/gremlins/stay-tuned/jcr:content/content3/textboximg_0/text"
换句话说,srcPath
具有关联的唯一条目references
。
我想如果我path
也想要,我将无法srcPath
在 csv 中拥有独特的线条?
数据:
{
"pages": [
{
"srcPath": "/content/dam/foo/about-bar/pdf/theplan.pdf",
"srcTitle": "theplan.pdf",
"path": "/content/foo/en/about-bar/the-plan-and-vision",
"title": "the Plan and Vision",
"references": [
"/content/foo/en/about-bar/the-plan-and-vision/jcr:content/content2/image/link",
"/content/foo/en/about-bar/the-plan-and-vision/jcr:content/content2/textboximg/boxFtr",
"/content/foo/en/about-bar/the-plan-and-vision/jcr:content/content1/textboximg/text"
],
"published": false,
"isPage": "true"
}
]
}
{
"pages": []
}
{
"pages": []
}
{
"pages": [
{
"srcPath": "/content/dam/foo/about-bar/photos/rayDavis.PNG",
"srcTitle": "rayDavis.PNG",
"path": "/content/foo/en/about-bar",
"title": "About bar",
"references": [
"/content/foo/en/about-bar/jcr:content/content1B/promos_1/image/fileReference"
],
"published": true,
"isPage": "true"
},
{
"srcPath": "/content/dam/foo/about-bar/photos/rayDavis.PNG",
"srcTitle": "rayDavis.PNG",
"path": "/content/foo/en/about-bar/monkey-development/tales-of-giving/ray-moose-davis",
"title": "ray moose Davis",
"references": [
"/content/foo/en/about-bar/monkey-development/tales-of-giving/ray-moose-davis/jcr:content/content1/textboximg/fileReference"
],
"published": true,
"isPage": "true"
},
{
"srcPath": "/content/dam/foo/about-bar/photos/rayDavis.PNG",
"srcTitle": "rayDavis.PNG",
"path": "/content/foo/en/about-bar/monkey-development/tales-of-giving",
"title": "tales of Giving",
"references": [
"/content/foo/en/about-bar/monkey-development/tales-of-giving/jcr:content/content1/textboximg_2/fileReference"
],
"published": true,
"isPage": "true"
}
]
}
{
"pages": [
{
"srcPath": "/content/dam/foo/about-bar/pdf/foo_19thNewsletter.pdf",
"srcTitle": "foo_19thNewsletter.pdf",
"path": "/content/foo/en/gremlins/stay-tuned",
"title": "Stay tuned",
"references": [
"/content/foo/en/gremlins/stay-tuned/jcr:content/content3/textboximg/text"
],
"published": true,
"isPage": "true"
}
]
}
{
"pages": [
{
"srcPath": "/content/dam/foo/about-bar/pdf/barNews_fall1617.pdf",
"srcTitle": "barNews_fall1617.pdf",
"path": "/content/foo/en/gremlins",
"title": "gremlins",
"references": [
"/content/foo/en/gremlins/jcr:content/content2C/textboximg_114671747/text"
],
"published": true,
"isPage": "true"
},
{
"srcPath": "/content/dam/foo/about-bar/pdf/barNews_fall1617.pdf",
"srcTitle": "barNews_fall1617.pdf",
"path": "/content/foo/en/gremlins/stay-tuned",
"title": "Stay tuned",
"references": [
"/content/foo/en/gremlins/stay-tuned/jcr:content/content3/textboximg_0/text"
],
"published": true,
"isPage": "true"
}
]
}
解决方案
您可以使用以下内容:
jq --raw-output '.pages | group_by(.srcPath)[] | [.[0].srcPath, .[0].published, .[].references[]] | @csv'
我们按 srcPath 对页面进行分组,并将每个组映射到一个数组中,该数组包含组的第一个元素的 srcPath 和发布的以及组中每个元素的引用。这些数组中的每一个都将是 CSV 结果中的一行。
推荐阅读
- git - 从 master 更改为新的默认分支 git
- sql-server - 为 SQL Server 中包含大量列的大表创建审计表
- xml - 在 Tensorflow 对象检测中创建没有 Labelimg 的 XML 文件
- opencv - AttributeError:模块'cv2'没有属性'face'
- reactjs - CodeMirror React 实现查找和替换
- azure-cosmosdb - CosmosDB 将 DateTime 属性转换为不同的时区
- elasticsearch - 用 elasticsearch [elasticsearch] 查找和替换单词
- ruby - ruby 将整数舍入到最接近 5 的倍数
- jquery - jQuery选择选项文本以开头
- linux - find + grep + cp 问题