sql - Athena/Presto:使用左连接取消嵌套 2 个数组
问题描述
所以我有 2 个 Json 数组需要取消嵌套,并基于 json 结构中的键加入。理论上很容易,但如果没有“左连接未嵌套”功能,一切都会变得混乱。
通过对结果进行分组,我已经实现了我想要的;但我也担心它正在执行 2 个交叉连接,有效地生成了数千个多余的行(在实时环境中),然后再次将它们过滤掉。
因此,我在这里的问题是真正寻找一种更有效的策略来执行相同的逻辑。我很清楚我的 Presto 经验和知识还处于起步阶段!
感谢您的指导!
工作原理:
基本逻辑: 'left' 数组中的每个项目都有一个 $.id 值。对于某些“左侧”项目,将有一个匹配的右侧项目与 $.a.id 值
例子 :
- 下面的第一个 SQL & 结果显示了设置,如果不是所需的结果。
- 第二组,显示了我当前的解决方案。
(1) Cross Join 的原始结果
with cte as (
Select
123 as record_id,
'[ {"id":"01","key1":["val1"]}, {"id":"02","key1":["val2"]}, {"id":"03","key1":["val3"]} ]' as "left",
'[ {"a":{"id":"02","key1":["apples"]}, "b":{"lala":"bananas"}},{"a":{"id":"01","key1":["one"]}, "b":{"lala":"oneone"}} ]' as "right"
)
select
record_id,
l.i as "left",
r.i as "right",
json_extract(l.i, '$.id') as left_id,
json_extract(r.i, '$.a.id') as right_id
from
cte,
unnest(cast (json_parse("left") as array(json))) as l(i), -- left array
unnest(cast (json_parse("right") as array(json))) as r(i) -- right array
输出:
记录ID | 剩下 | 对 | left_id | right_id |
---|---|---|---|---|
123 | {"id":"01","key1":["val1"]} | {"a":{"id":"02","key1":["apples"]},"b":{"lala":"bananas"}} | “01” | “02” |
123 | {"id":"01","key1":["val1"]} | {"a":{"id":"01","key1":["one"]},"b":{"lala":"oneone"}} | “01” | “01” |
123 | {"id":"02","key1":["val2"]} | {"a":{"id":"02","key1":["apples"]},"b":{"lala":"bananas"}} | “02” | “02” |
123 | {"id":"02","key1":["val2"]} | {"a":{"id":"01","key1":["one"]},"b":{"lala":"oneone"}} | “02” | “01” |
123 | {"id":"03","key1":["val3"]} | {"a":{"id":"02","key1":["apples"]},"b":{"lala":"bananas"}} | “03” | “02” |
123 | {"id":"03","key1":["val3"]} | {"a":{"id":"01","key1":["one"]},"b":{"lala":"oneone"}} | “03” | “01” |
(2) 目前的解决方案
select
record_id,
l.i as "left",
max( if(json_extract(l.i, '$.id') = json_extract(r.i, '$.a.id'),json_format(r.i),null) )as match
from
cte,
unnest(cast (json_parse("left") as array(json))) as l(i), -- left array
unnest(cast (json_parse("right") as array(json))) as r(i) -- right array
group by
record_id,
l.i
记录ID | 剩下 | 匹配 |
---|---|---|
123 | {"id":"01","key1":["val1"]} | {"a":{"id":"01","key1":["one"]},"b":{"lala":"oneone"}} |
123 | {"id":"02","key1":["val2"]} | {"a":{"id":"02","key1":["apples"]},"b":{"lala":"bananas"}} |
123 | {"id":"03","key1":["val3"]} |
解决方案
在 CTE 和左连接 CTE 中取消嵌套两个数组,在这种情况下,您将消除交叉连接,但代码有点长:
with cte as (
Select
123 as record_id,
'[ {"id":"01","key1":["val1"]}, {"id":"02","key1":["val2"]}, {"id":"03","key1":["val3"]} ]' as "left",
'[ {"a":{"id":"02","key1":["apples"]}, "b":{"lala":"bananas"}},{"a":{"id":"01","key1":["one"]}, "b":{"lala":"oneone"}} ]' as "right"
),
"left" as (
select
record_id,
l.i as "left",
json_extract(l.i, '$.id') as left_id
from
cte,
unnest(cast (json_parse("left") as array(json))) as l(i) -- left array
),
"right" as (
select
record_id,
r.i as "right",
json_extract(r.i, '$.a.id') as right_id
from
cte,
unnest(cast (json_parse("right") as array(json))) as r(i) -- right array
)
select
l.record_id,
l."left",
r."right",
l.left_id,
r.right_id
from
"left" l left join "right" r on l.record_id=r.record_id and l.left_id=r.right_id
结果:
记录ID | 剩下 | 对 | left_id | right_id |
---|---|---|---|---|
123 | {"id":"01","key1":["val1"]} | {"a":{"id":"01","key1":["one"]},"b":{"lala":"oneone"}} | “01” | “01” |
123 | {"id":"02","key1":["val2"]} | {"a":{"id":"02","key1":["apples"]},"b":{"lala":"bananas"}} | “02” | “02” |
123 | {"id":"03","key1":["val3"]} | \N | “03” | \N |
推荐阅读
- python - Python XML SOAP 解析
- javascript - Control + F 类似的文本搜索框嵌入到 iframe 的网站
- c# - MVC 在 Modal Popup Submit 按钮上打开部分视图
- node.js - 如何从 AWS SNS 通知中提取消息属性?
- c++ - AIX C++ Map 编译问题
- javascript - 延迟 componentDidMount fetch 以读取 this.props
- sql - 动态 SQL - 通过 INSERT INTO x SELECT 语句插入变量
- reactjs - REDUX 创建另一个类似组件的方式
- javascript - 找不到输入搜索的值
- angular - 来自 HTTP 调用的 Angular 动态模板