arrays - 雪花 JSON 解析在数组(列表)中没有“键”的情况下展平项目
问题描述
我正在尝试解析来自 AWS 文本的 JSON 提取。
{
"BlockType": "CELL",
"ColumnIndex": 2,
"ColumnSpan": 1,
"Confidence": 99.87053680419922,
"Geometry": {
"BoundingBox": {
"Height": 0.023682409897446632,
"Left": 0.10847073048353195,
"Top": 0.6578078269958496,
"Width": 0.18180404603481293
},
"Polygon": [
{
"X": 0.10847073048353195,
"Y": 0.6578078269958496
},
{
"X": 0.2902747690677643,
"Y": 0.6578078269958496
},
{
"X": 0.2902747690677643,
"Y": 0.6814902424812317
},
{
"X": 0.10847073048353195,
"Y": 0.6814902424812317
}
]
},
"Id": "8f1d94e9-f936-4ea9-bae8-b3e0347c5411",
"Page": 1,
"Relationships": [
{
"Ids": [
"5b7b930f-5715-4fa6-b85b-9e2253e2b389",
"52e9d462-5aa2-4df2-9a33-3b59e03ea256",
"92d278f8-2d44-41ce-8cfd-35feccfba29d",
"69121af9-3270-488b-9335-70359859f9d6",
"827011b6-fc8c-4749-89bd-91f376eaa554",
"dd001bd6-f146-41e0-b5b3-6afb2bf37f17",
"a4286c8f-f7a2-4b7d-84b0-e38b7cd93225"
],
"Type": "CHILD"
}
],
"RowIndex": 1,
"RowSpan": 1
}
当我尝试使用以下方法进行展平时:
lateral flatten(input => SOURCE_CONTENTS:Relationships[0]:Ids , MODE=> 'BOTH') where SOURCE_CONTENTS:BlockType = 'CELL'
where 语句返回 Id 的完整列表。所有 7 人每行重复 7 次。我想得到数组中的第一个、第二个、第三个项目。每行 1 个。
26
[ { "Ids": [ "5b7b930f-5715-4fa6-b85b-9e2253e2b389", "52e9d462-5aa2-4df2-9a33-3b59e03ea256", "92d278f8-2d44-41ce-8cfd-35feccfba29d", "69121af9-3270-488b-9335-70359859f9d6", "827011b6-fc8c-4749-89bd-91f376eaa554", ...
27
[ { "Ids": [ "5b7b930f-5715-4fa6-b85b-9e2253e2b389", "52e9d462-5aa2-4df2-9a33-3b59e03ea256", "92d278f8-2d44-41ce-8cfd-35feccfba29d", "69121af9-3270-488b-9335-70359859f9d6", "827011b6-fc8c-4749-89bd-91f376eaa554", ...
28
[ { "Ids": [ "5b7b930f-5715-4fa6-b85b-9e2253e2b389", "52e9d462-5aa2-4df2-9a33-3b59e03ea256", "92d278f8-2d44-41ce-8cfd-35feccfba29d", "69121af9-3270-488b-9335-70359859f9d6", "827011b6-fc8c-4749-89bd-91f376eaa554", ...
29
[ { "Ids": [ "5b7b930f-5715-4fa6-b85b-9e2253e2b389", "52e9d462-5aa2-4df2-9a33-3b59e03ea256", "92d278f8-2d44-41ce-8cfd-35feccfba29d", "69121af9-3270-488b-9335-70359859f9d6", "827011b6-fc8c-4749-89bd-91f376eaa554", ...
30
[ { "Ids": [ "5b7b930f-5715-4fa6-b85b-9e2253e2b389", "52e9d462-5aa2-4df2-9a33-3b59e03ea256", "92d278f8-2d44-41ce-8cfd-35feccfba29d", "69121af9-3270-488b-9335-70359859f9d6", "827011b6-fc8c-4749-89bd-91f376eaa554", ...
31
[ { "Ids": [ "5b7b930f-5715-4fa6-b85b-9e2253e2b389", "52e9d462-5aa2-4df2-9a33-3b59e03ea256", "92d278f8-2d44-41ce-8cfd-35feccfba29d", "69121af9-3270-488b-9335-70359859f9d6", "827011b6-fc8c-4749-89bd-91f376eaa554", ...
32
[ { "Ids": [ "5b7b930f-5715-4fa6-b85b-9e2253e2b389", "52e9d462-5aa2-4df2-9a33-3b59e03ea256", "92d278f8-2d44-41ce-8cfd-35feccfba29d", "69121af9-3270-488b-9335-70359859f9d6", "827011b6-fc8c-4749-89bd-91f376eaa554", ...
预期输出:
'5b7b930f-5715-4fa6-b85b-9e2253e2b389'
'52e9d462-5aa2-4df2-9a33-3b59e03ea256'
'92d278f8-2d44-41ce-8cfd-35feccfba29d'
'69121af9-3270-488b-9335-70359859f9d6'
'827011b6-fc8c-4749-89bd-91f376eaa554'
编辑: 既然你问了输出。我们真正希望它能够重新创建从 Snowflake 中的 AWS 文本提取的表。这只是第一部分。获取单元格内容以返回与单元格关联的 ID。然后我们可以查找单元格的内容并重建表格。如您所见,“CELL”块包含列和行引用以及指向单元格内容的子关系。
我试图把这个输出放在一个包含这个孩子列表的 where 语句中,这样我们就可以得到与他们相关的单词。那可能是错误的做法。
例如:如果子值 '861db385-0764-4fef-a4c1-f354a5e08fce' 采用以下格式。我可以提取单元格的内容,包括与单元格关联的单词。
SELECT SOURCE_CONTENTS, SOURCE_CONTENTS:Text, SOURCE_CONTENTS:Id ,SNIPPET_ORDER
FROM temp_table
WHERE SOURCE_CONTENTS:Id IN ('5b7b930f-5715-4fa6-b85b-9e2253e2b389',
'52e9d462-5aa2-4df2-9a33-3b59e03ea256',
'92d278f8-2d44-41ce-8cfd-35feccfba29d',
'69121af9-3270-488b-9335-70359859f9d6',
'827011b6-fc8c-4749-89bd-91f376eaa554',
'dd001bd6-f146-41e0-b5b3-6afb2bf37f17',
'a4286c8f-f7a2-4b7d-84b0-e38b7cd93225' ) // Child values
;
解决方案
您没有为其他人提供足够的上下文(表定义、准确的输出等)来提供全面的解决方案。我相信这(或类似的东西)可能是你正在寻找的东西,但不确定......
WITH MY_TABLE AS (
SELECT PARSE_JSON($1) AS SOURCE_CONTENTS
FROM VALUES ($$
{
"BlockType": "CELL",
"ColumnIndex": 2,
"ColumnSpan": 1,
"Confidence": 99.87053680419922,
"Geometry": {
"BoundingBox": {
"Height": 0.023682409897446632,
"Left": 0.10847073048353195,
"Top": 0.6578078269958496,
"Width": 0.18180404603481293
},
"Polygon": [
{
"X": 0.10847073048353195,
"Y": 0.6578078269958496
},
{
"X": 0.2902747690677643,
"Y": 0.6578078269958496
},
{
"X": 0.2902747690677643,
"Y": 0.6814902424812317
},
{
"X": 0.10847073048353195,
"Y": 0.6814902424812317
}
]
},
"Id": "8f1d94e9-f936-4ea9-bae8-b3e0347c5411",
"Page": 1,
"Relationships": [
{
"Ids": [
"5b7b930f-5715-4fa6-b85b-9e2253e2b389",
"52e9d462-5aa2-4df2-9a33-3b59e03ea256",
"92d278f8-2d44-41ce-8cfd-35feccfba29d",
"69121af9-3270-488b-9335-70359859f9d6",
"827011b6-fc8c-4749-89bd-91f376eaa554",
"dd001bd6-f146-41e0-b5b3-6afb2bf37f17",
"a4286c8f-f7a2-4b7d-84b0-e38b7cd93225"
],
"Type": "CHILD"
}
],
"RowIndex": 1,
"RowSpan": 1
}
$$)
)
SELECT ARRAY_AGG(F.VALUE) AS Relationship_Ids
FROM MY_TABLE T
,LATERAL FLATTEN(SOURCE_CONTENTS:"Relationships"[0]:"Ids") F
WHERE SOURCE_CONTENTS:"BlockType"::VARCHAR = 'CELL'
GROUP BY F.SEQ
;
推荐阅读
- wordpress - 调整 .htaccess 链接后导致重定向过多
- dart - 如何在飞镖中将字符串转换为Uuid
- firebase-authentication - Apple 登录 - 检索姓名和电子邮件
- reactjs - 如何在玩笑测试中共享设置和拆卸?
- database - 我可以删除显示无法连接的 postgres 数据库吗?
- python - 匹配 Pandas DataFrame 时出现“KeyError: True”
- perl - 如何获取已知 Unicode 字符名称的列表
- firebase - 异步函数:未来
在颤振中布尔? - r - 循环回归和存储截距、斜率等
- python - PermissionError:[错误 13] 权限被拒绝:scr.png