首页 > 解决方案 > 最低条目总和的条件分组

问题描述

运行 PostgreSQL 10。我有一个有效的查询,但它的成本估计explain analyze甚至在生产数据上运行之前就很大:

 GroupAggregate  (cost=2054181.54..2054532.69 rows=2554 width=44) (actual time=401.096..401.116 rows=12 loops=1)

匿名查询是:

select rudolph_id, peter_id, leroy_or_other, sum(randys) as total_randys
from ( -- sums_with_combined_peter
  select rudolph_id, peter_id, randys,
         case
           when put_in_other then null
           else leroy_id
         end as leroy_or_other
  from ( -- sums_with_criteria
    select rudolph_id, peter_id, leroy_id, sum_within_leroy, randys,
           row_number() over (
               partition by rudolph_id, peter_id
               order by rudolph_id, peter_id, sum_within_leroy desc, leroy_id
           ) > 6 or sum_within_leroy = 0 as put_in_other
    from ( -- sums
      select r.id as rudolph_id,
             p.id as peter_id,
             l.id as leroy_id,
             count(cp.randy_id) as randys,
             sum(count(cp.randy_id)) over (
                 partition by r.id, l.id
             ) as sum_within_leroy
      from            peter      p
           cross join rudolph    r
           join       leroy      l  using(oscar_id)
           left join  leroy_adam la on la.leroy_id = l.id
           left join  adam       a  on a.rudolph_id = r.id
                                   and a.id = la.id
           left join chosen_peter cp on cp.randy_id = a.randy_id
                                    and cp.rudolph_id = r.id
                                    and cp.peter_id = p.id
      group by r.id, p.id, l.id
    ) sums
  ) sums_with_criteria
) sums_with_combined_peter
where rudolph_id=4
group by rudolph_id, peter_id, leroy_or_other;

就规模而言,peter、rudolph、leroy 小,adam 和 selected_peter 大。查询的目的是进行条件分组 - 这order by sum_within_leroy desc是一个总和,在六个条目之后或如果条目为零,则被归为“其他”。此“其他”表示为 anull而不是leroyID。

我已经避免使用 CTE。有一些索引列,但可能不够。

省略内部视图部分的查询计划是:

-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 GroupAggregate  (cost=2054181.54..2054532.69 rows=2554 width=44) (actual time=349.482..349.502 rows=12 loops=1)
   Group Key: sums_with_criteria.rudolph_id, sums_with_criteria.peter_id, (CASE WHEN sums_with_criteria.put_in_other THEN NULL::integer ELSE sums_with_criteria.leroy_id END)
   ->  Sort  (cost=2054181.54..2054245.38 rows=25538 width=20) (actual time=349.438..349.442 rows=56 loops=1)
         Sort Key: sums_with_criteria.peter_id, (CASE WHEN sums_with_criteria.put_in_other THEN NULL::integer ELSE sums_with_criteria.leroy_id END)
         Sort Method: quicksort  Memory: 27kB
         ->  Subquery Scan on sums_with_criteria  (cost=2051162.90..2052312.11 rows=25538 width=20) (actual time=349.241..349.290 rows=56 loops=1)
               ->  WindowAgg  (cost=2051162.90..2052056.73 rows=25538 width=53) (actual time=349.238..349.278 rows=56 loops=1)
                     ->  Sort  (cost=2051162.90..2051226.74 rows=25538 width=52) (actual time=349.159..349.163 rows=56 loops=1)
                           Sort Key: sums.peter_id, sums.sum_within_leroy DESC, sums.leroy_id
                           Sort Method: quicksort  Memory: 29kB
                           ->  Subquery Scan on sums  (cost=2048527.33..2049293.47 rows=25538 width=52) (actual time=348.883..348.937 rows=56 loops=1)
                                 ->  WindowAgg  (cost=2048527.33..2049038.09 rows=25538 width=52) (actual time=348.882..348.927 rows=56 loops=1)
                                       ->  Sort  (cost=2048527.33..2048591.18 rows=25538 width=20) (actual time=348.725..348.730 rows=56 loops=1)
                                             Sort Key: l.id
                                             Sort Method: quicksort  Memory: 29kB
                                             ->  GroupAggregate  (cost=2046019.45..2046657.90 rows=25538 width=20) (actual time=348.396..348.517 rows=56 loops=1)
                                                   Group Key: r.id, p.id, l.id
                                                   ->  Sort  (cost=2046019.45..2046083.30 rows=25538 width=16) (actual time=348.378..348.411 rows=448 loops=1)
                                                         Sort Key: p.id, l.id
                                                         Sort Method: quicksort  Memory: 46kB
                                                         ->  Hash Left Join  (cost=2043576.65..2044150.03 rows=25538 width=16) (actual time=347.088..347.846 rows=448 loops=1)
                                                               Hash Cond: ((r.id = cp.rudolph_id) AND (a.randy_id = cp.randy_id) AND (p.id = cp.peter_id))
                                                               ->  Nested Loop  (cost=23.61..395.87 rows=25538 width=16) (actual time=0.583..1.147 rows=448 loops=1)
                                                                     ->  Seq Scan on peter p  (cost=0.00..32.60 rows=2260 width=4) (actual time=0.037..0.039 rows=4 loops=1)
                                                                     ->  Materialize  (cost=23.61..52.55 rows=11 width=12) (actual time=0.136..0.256 rows=112 loops=4)
                                                                           ->  Nested Loop Left Join  (cost=23.61..52.50 rows=11 width=12) (actual time=0.498..0.881 rows=112 loops=1)
                                                                                 Join Filter: (a.rudolph_id = r.id)
                                                                                 ->  Hash Right Join  (cost=23.33..25.71 rows=11 width=12) (actual time=0.381..0.445 rows=112 loops=1)
                                                                                       Hash Cond: (la.leroy_id = l.id)
                                                                                       ->  Seq Scan on leroy_adam la  (cost=0.00..2.00 rows=100 width=8) (actual time=0.018..0.031 rows=100 loops=1)
                                                                                       ->  Hash  (cost=23.19..23.19 rows=11 width=8) (actual time=0.187..0.187 rows=14 loops=1)
                                                                                             Buckets: 1024  Batches: 1  Memory Usage: 9kB
                                                                                             ->  Nested Loop  (cost=4.39..23.19 rows=11 width=8) (actual time=0.143..0.148 rows=14 loops=1)
                                                                                                   ->  Index Scan using rudolph_pkey on rudolph r  (cost=0.15..8.17 rows=1 width=8) (actual time=0.113..0.113 rows=1 loops=1)
                                                                                                         Index Cond: (id = 4)
                                                                                                   ->  Bitmap Heap Scan on leroy l  (cost=4.24..14.91 rows=11 width=8) (actual time=0.015..0.017 rows=14 loops=1)
                                                                                                         Recheck Cond: (oscar_id = r.oscar_id)
                                                                                                         Heap Blocks: exact=1
                                                                                                         ->  Bitmap Index Scan on leroy_oscar_id_idx  (cost=0.00..4.24 rows=11 width=0) (actual time=0.006..0.006 rows=14 loops=1)
                                                                                                               Index Cond: (oscar_id = r.oscar_id)
                                                                                 ->  Index Scan using adam_pkey on adam a  (cost=0.28..2.42 rows=1 width=12) (actual time=0.003..0.003 rows=1 loops=112)
                                                                                       Index Cond: (id = la.id)
                                                                                       Filter: (rudolph_id = 4)
                                                               ->  Hash  (cost=2043462.97..2043462.97 rows=5147 width=12) (actual time=345.651..345.651 rows=100 loops=1)
                                                                     Buckets: 8192  Batches: 1  Memory Usage: 69kB
                                                                     ->  Subquery Scan on cp  (cost=2043154.14..2043462.97 rows=5147 width=12) (actual time=345.443..345.540 rows=100 loops=1)
                                                                            [complex view]
 Planning time: 57.899 ms
 Execution time: 355.805 ms
(133 rows)

还值得一提的是,这chosen_peter是一个昂贵的观点;使用应用程序典型where产量对其进行选择

Subquery Scan on chosen_peter  (cost=2042804.14..2043112.97 rows=5147 width=69) (actual time=352.702..352.798 rows=100 loops=1)

可以做些什么来加快这个查询的执行速度?

标签: sqlpostgresqloptimizationquery-optimization

解决方案


过度消费的原因在这里:

 ->  Hash Left Join  (cost=2043576.65..2044150.03 rows=25538 width=16) (actual time=347.088..347.846 rows=448 loops=1)
 Hash Cond: ((r.id = cp.rudolph_id) AND (a.randy_id = cp.randy_id) AND (p.id = cp.peter_id))

尝试这个:

select rudolph_id, peter_id, leroy_or_other, sum(randys) as total_randys
from ( -- sums_with_combined_peter
  select rudolph_id, peter_id, randys,
         case
           when put_in_other then null
           else leroy_id
         end as leroy_or_other
  from ( -- sums_with_criteria
    select rudolph_id, peter_id, leroy_id, sum_within_leroy, randys,
           row_number() over (
               partition by rudolph_id, peter_id
               order by rudolph_id, peter_id, sum_within_leroy desc, leroy_id
           ) > 6 or sum_within_leroy = 0 as put_in_other
    from ( -- sums
      select r.id as rudolph_id,
             p.id as peter_id,
             l.id as leroy_id,
             count(cp.randy_id) as randys,
             sum(count(cp.randy_id)) over (
                 partition by r.id, l.id
             ) as sum_within_leroy
      from            peter      p
           cross join rudolph    r
           join       leroy      l  using(oscar_id)
           left join  leroy_adam la on la.leroy_id = l.id
           left join  adam       a  on a.rudolph_id = r.id
                                   and a.id = la.id
           left join chosen_peter cp on cp.randy_id = a.randy_id
                                    and cp.rudolph_id = r.id
                                    and cp.peter_id = p.id
                                    and cp.rudolph_id = 4
           where r.id=4
      group by r.id, p.id, l.id
    ) sums
  ) sums_with_criteria
) sums_with_combined_peter
group by rudolph_id, peter_id, leroy_or_other;

推荐阅读