sql - 最低条目总和的条件分组
问题描述
运行 PostgreSQL 10。我有一个有效的查询,但它的成本估计explain analyze
甚至在生产数据上运行之前就很大:
GroupAggregate (cost=2054181.54..2054532.69 rows=2554 width=44) (actual time=401.096..401.116 rows=12 loops=1)
匿名查询是:
select rudolph_id, peter_id, leroy_or_other, sum(randys) as total_randys
from ( -- sums_with_combined_peter
select rudolph_id, peter_id, randys,
case
when put_in_other then null
else leroy_id
end as leroy_or_other
from ( -- sums_with_criteria
select rudolph_id, peter_id, leroy_id, sum_within_leroy, randys,
row_number() over (
partition by rudolph_id, peter_id
order by rudolph_id, peter_id, sum_within_leroy desc, leroy_id
) > 6 or sum_within_leroy = 0 as put_in_other
from ( -- sums
select r.id as rudolph_id,
p.id as peter_id,
l.id as leroy_id,
count(cp.randy_id) as randys,
sum(count(cp.randy_id)) over (
partition by r.id, l.id
) as sum_within_leroy
from peter p
cross join rudolph r
join leroy l using(oscar_id)
left join leroy_adam la on la.leroy_id = l.id
left join adam a on a.rudolph_id = r.id
and a.id = la.id
left join chosen_peter cp on cp.randy_id = a.randy_id
and cp.rudolph_id = r.id
and cp.peter_id = p.id
group by r.id, p.id, l.id
) sums
) sums_with_criteria
) sums_with_combined_peter
where rudolph_id=4
group by rudolph_id, peter_id, leroy_or_other;
就规模而言,peter、rudolph、leroy 小,adam 和 selected_peter 大。查询的目的是进行条件分组 - 这order by sum_within_leroy desc
是一个总和,在六个条目之后或如果条目为零,则被归为“其他”。此“其他”表示为 anull
而不是leroy
ID。
我已经避免使用 CTE。有一些索引列,但可能不够。
省略内部视图部分的查询计划是:
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
GroupAggregate (cost=2054181.54..2054532.69 rows=2554 width=44) (actual time=349.482..349.502 rows=12 loops=1)
Group Key: sums_with_criteria.rudolph_id, sums_with_criteria.peter_id, (CASE WHEN sums_with_criteria.put_in_other THEN NULL::integer ELSE sums_with_criteria.leroy_id END)
-> Sort (cost=2054181.54..2054245.38 rows=25538 width=20) (actual time=349.438..349.442 rows=56 loops=1)
Sort Key: sums_with_criteria.peter_id, (CASE WHEN sums_with_criteria.put_in_other THEN NULL::integer ELSE sums_with_criteria.leroy_id END)
Sort Method: quicksort Memory: 27kB
-> Subquery Scan on sums_with_criteria (cost=2051162.90..2052312.11 rows=25538 width=20) (actual time=349.241..349.290 rows=56 loops=1)
-> WindowAgg (cost=2051162.90..2052056.73 rows=25538 width=53) (actual time=349.238..349.278 rows=56 loops=1)
-> Sort (cost=2051162.90..2051226.74 rows=25538 width=52) (actual time=349.159..349.163 rows=56 loops=1)
Sort Key: sums.peter_id, sums.sum_within_leroy DESC, sums.leroy_id
Sort Method: quicksort Memory: 29kB
-> Subquery Scan on sums (cost=2048527.33..2049293.47 rows=25538 width=52) (actual time=348.883..348.937 rows=56 loops=1)
-> WindowAgg (cost=2048527.33..2049038.09 rows=25538 width=52) (actual time=348.882..348.927 rows=56 loops=1)
-> Sort (cost=2048527.33..2048591.18 rows=25538 width=20) (actual time=348.725..348.730 rows=56 loops=1)
Sort Key: l.id
Sort Method: quicksort Memory: 29kB
-> GroupAggregate (cost=2046019.45..2046657.90 rows=25538 width=20) (actual time=348.396..348.517 rows=56 loops=1)
Group Key: r.id, p.id, l.id
-> Sort (cost=2046019.45..2046083.30 rows=25538 width=16) (actual time=348.378..348.411 rows=448 loops=1)
Sort Key: p.id, l.id
Sort Method: quicksort Memory: 46kB
-> Hash Left Join (cost=2043576.65..2044150.03 rows=25538 width=16) (actual time=347.088..347.846 rows=448 loops=1)
Hash Cond: ((r.id = cp.rudolph_id) AND (a.randy_id = cp.randy_id) AND (p.id = cp.peter_id))
-> Nested Loop (cost=23.61..395.87 rows=25538 width=16) (actual time=0.583..1.147 rows=448 loops=1)
-> Seq Scan on peter p (cost=0.00..32.60 rows=2260 width=4) (actual time=0.037..0.039 rows=4 loops=1)
-> Materialize (cost=23.61..52.55 rows=11 width=12) (actual time=0.136..0.256 rows=112 loops=4)
-> Nested Loop Left Join (cost=23.61..52.50 rows=11 width=12) (actual time=0.498..0.881 rows=112 loops=1)
Join Filter: (a.rudolph_id = r.id)
-> Hash Right Join (cost=23.33..25.71 rows=11 width=12) (actual time=0.381..0.445 rows=112 loops=1)
Hash Cond: (la.leroy_id = l.id)
-> Seq Scan on leroy_adam la (cost=0.00..2.00 rows=100 width=8) (actual time=0.018..0.031 rows=100 loops=1)
-> Hash (cost=23.19..23.19 rows=11 width=8) (actual time=0.187..0.187 rows=14 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 9kB
-> Nested Loop (cost=4.39..23.19 rows=11 width=8) (actual time=0.143..0.148 rows=14 loops=1)
-> Index Scan using rudolph_pkey on rudolph r (cost=0.15..8.17 rows=1 width=8) (actual time=0.113..0.113 rows=1 loops=1)
Index Cond: (id = 4)
-> Bitmap Heap Scan on leroy l (cost=4.24..14.91 rows=11 width=8) (actual time=0.015..0.017 rows=14 loops=1)
Recheck Cond: (oscar_id = r.oscar_id)
Heap Blocks: exact=1
-> Bitmap Index Scan on leroy_oscar_id_idx (cost=0.00..4.24 rows=11 width=0) (actual time=0.006..0.006 rows=14 loops=1)
Index Cond: (oscar_id = r.oscar_id)
-> Index Scan using adam_pkey on adam a (cost=0.28..2.42 rows=1 width=12) (actual time=0.003..0.003 rows=1 loops=112)
Index Cond: (id = la.id)
Filter: (rudolph_id = 4)
-> Hash (cost=2043462.97..2043462.97 rows=5147 width=12) (actual time=345.651..345.651 rows=100 loops=1)
Buckets: 8192 Batches: 1 Memory Usage: 69kB
-> Subquery Scan on cp (cost=2043154.14..2043462.97 rows=5147 width=12) (actual time=345.443..345.540 rows=100 loops=1)
[complex view]
Planning time: 57.899 ms
Execution time: 355.805 ms
(133 rows)
还值得一提的是,这chosen_peter
是一个昂贵的观点;使用应用程序典型where
产量对其进行选择
Subquery Scan on chosen_peter (cost=2042804.14..2043112.97 rows=5147 width=69) (actual time=352.702..352.798 rows=100 loops=1)
可以做些什么来加快这个查询的执行速度?
解决方案
过度消费的原因在这里:
-> Hash Left Join (cost=2043576.65..2044150.03 rows=25538 width=16) (actual time=347.088..347.846 rows=448 loops=1)
Hash Cond: ((r.id = cp.rudolph_id) AND (a.randy_id = cp.randy_id) AND (p.id = cp.peter_id))
尝试这个:
select rudolph_id, peter_id, leroy_or_other, sum(randys) as total_randys
from ( -- sums_with_combined_peter
select rudolph_id, peter_id, randys,
case
when put_in_other then null
else leroy_id
end as leroy_or_other
from ( -- sums_with_criteria
select rudolph_id, peter_id, leroy_id, sum_within_leroy, randys,
row_number() over (
partition by rudolph_id, peter_id
order by rudolph_id, peter_id, sum_within_leroy desc, leroy_id
) > 6 or sum_within_leroy = 0 as put_in_other
from ( -- sums
select r.id as rudolph_id,
p.id as peter_id,
l.id as leroy_id,
count(cp.randy_id) as randys,
sum(count(cp.randy_id)) over (
partition by r.id, l.id
) as sum_within_leroy
from peter p
cross join rudolph r
join leroy l using(oscar_id)
left join leroy_adam la on la.leroy_id = l.id
left join adam a on a.rudolph_id = r.id
and a.id = la.id
left join chosen_peter cp on cp.randy_id = a.randy_id
and cp.rudolph_id = r.id
and cp.peter_id = p.id
and cp.rudolph_id = 4
where r.id=4
group by r.id, p.id, l.id
) sums
) sums_with_criteria
) sums_with_combined_peter
group by rudolph_id, peter_id, leroy_or_other;
推荐阅读
- c# - 如果 Task 被取消固定次数,则重试 Web API 调用
- database-design - 自定义通知系统数据库提高发送给所有用户时的效率
- amazon-web-services - 我想通过 AWS IoT 规则检索主题名称
- ruby-on-rails - ruby on rails 如何动态定义 `find_by_id`、`find_by_name` 方法?
- angular - 表单提交Angular 6上的空对象
- android - 如何删除 CardView 的多余空间角?
- java - 尝试调用另一个类中的方法以在 IF 语句中使用
- swift - 如何使用 RxSwift 从 UITableView 中删除一行而不会出现任何不一致错误?
- c# - 如何在c#中使用函数将参数传递给oracle存储过程
- c# - 运行并行 C# 的多个“任务”