postgresql - 如何通过多个子查询和连接来缩短加载时间?
问题描述
我有一个查询来查找哪个 v_address 属于哪个 v_group,以及 v_address 的创建日期是什么时候。但是,每个 v_group 有时是活动的和非活动的,并且它有日期。所以,我写了查询来找到这些变化的时期。
我的问题是我的查询运行时间太长,因为它有多个子查询和连接。
有没有人有更好的想法来缩短加载数据?
我在下面附加了我的查询identifier = 81
是为了进行较小的测试。对于最终结果,我需要检索超过 100k 个 ID 的数据。
我试图将内连接更改为左连接,但它会丢失空值并且仍然需要很长时间才能检索数据。
with v_address as (
select id, v_group_id, created_at
from prod.venue_address --different schema and table. v_address.v_group_id can have multiple v_address.id
group by 1,2,3
order by 3 asc
),
v_group as (
select identifier, final_status, created_at
from dwh.venue_address_archive
where identifier = 81
group by 1,2,3
order by 3 asc
),
filtering as (
select identifier, created_at,
case when sum(case when final_state = 'active' then 1 else 0 end) > 0 then 'active' else 'inactive' end as filtered_status --This filters either of active or inactive
from v_group
group by 1,2
order by 2 asc
),
prev as (
select identifier, created_at, filtered_status,
lag(case when filtered_status = 'active' then 'active' else 'inactive' end) over (partition by identifier order by created_at) = filtered_status as is_prev_status
from filtering
group by 1,2,3
),
periods as (
select identifier, filtered_status, created_at, is_prev_status,
sum(case when is_prev_status = true then 0 else 1 end) over (order by identifier, created_at) as period
from prev
group by 1,2,3,4
),
islands_gaps_start as (
select identifier, period, min(created_at) as start_at
from periods
group by 2,1
),
islands_gaps as (
select identifier, period, start_at,
lead(start_at) over (partition by identifier order by period) as end_at
from islands_gaps_start
)
select vg.identifier as "vg_id", p.created_at as "vg_created_at", p.filtered_status as "status", p.is_prev_status, p.period, va.id as "va_id", g.start_at, g.end_at
from v_address va
left join v_group vg
on va.venue_id = vg.identifier
inner join filtering f
on vg.identifier = f.identifier
inner join prev pr
on pr.filtered_status = f.filtered_status
inner join periods p
on p.filtered_status = pr.filtered_status
inner join islands_gaps_start gs
on p.period = gs.period
inner join islands_gaps g
on gs.start_at = g.start_at
group by 6,1,2,3,4,5,7,8
order by 2 asc
我已经有了带有示例的输出,identifier = 81
但是我必须针对超过 100k 的标识符运行此查询,因此,我正在寻找可以缩短查询的任何建议。
解决方案
推荐阅读
- javascript - Shiny R中的父/子行,具有可变行数的单个数据框
- oracle - 使用 ora2pg 工具从 oracle 中值为 0 的 CLOB 列中提取数据到 postgresql
- python - Python Excel电子表格阅读
- r - 在R中使用逻辑条件的ifelse语句
- mysql - '。' 在这个位置无效,期望:EOF,';'
- tensorflow - TensorFlow 迁移学习 ValueError
- r - 尝试将数据集拟合到对数族时出现错误?
- sql - Reg:Substr 和列长度
- akka-stream - 需要一个如何在 akka 流中使用 flatMapPrefix 的示例
- php - 在php中使用带有计数器的rename()函数重命名文件