首页 > 解决方案 > PostgreSQL:使用窗口函数计数(*)超过()时按主键排序缓慢

问题描述

我的表,它们存在于查询中

create table sku
(
    id               bigserial                                          primary key,
    item_id          bigint                                             not null
        constraint sku_item_id_fkey
            references items,
    seller_id        bigint                                             not null,
    updated_at       timestamp with time zone default now()             not null,
    price            numeric(19, 2)                                     not null,
    weight           numeric(19, 2),
    quantity         integer                                            not null,
    min_order_size   integer                  default 0                 not null,
    pack_type_id     bigint
        constraint sku_pack_type_id_fkey
            references pack_types,
    unit_id          bigint
        constraint sku_unit_id_fkey
            references units,
    legal_seller     boolean                                            not null,
    created_at       timestamp with time zone default now()             not null,
    title            jsonb                    default '{}'::jsonb       not null,
    seller_sku_id    varchar                  default gen_random_uuid() not null,
    original_price   numeric(19, 2)                                     not null,
    max_order_size   integer,
    stock_address_id bigint,
    constraint seller_id_sku_id_constraint
        unique (seller_id, seller_sku_id),
    constraint sku_check
        check (original_price >= price)
);
create table sku_characteristics
(
    id                       bigserial primary key,
    sku_id                   bigint    not null,
    item_id                  bigint    not null,
    characteristic_id        bigint    not null
        constraint sku_characteristics_characteristic_id_fkey
            references characteristics,
    characteristic_option_id bigint
        constraint sku_characteristics_characteristic_option_id_fkey
            references characteristic_options,
    numeric_value            numeric(19, 6),
    string_value             jsonb,
    constraint sku_unique_characteristics
        unique (sku_id, characteristic_id),
    constraint sku_characteristics_item_id_fkey
        foreign key (sku_id, item_id) references sku (id, item_id)
            on update cascade on delete cascade
);

在此查询中,我试图获取具有所有特征的分页 SKU 列表。为了找出构建分页菜单的 SKU 数量而不进行单独查询,我使用了窗口函数count (*) over ()。我按降序排序id

explain analyse select
        count(*) over ()                as total_count,
       sku.item_id,
       sku.id,
       sku.seller_sku_id,
       sku.updated_at,
       sku.price,
       sku.original_price,
       sku.weight,
       sku.title,
       sku.seller_id,
       sku.legal_seller,
       sku.quantity,
       sku.min_order_size,
       sku.max_order_size,
       sku.pack_type_id,
       sku.unit_id,
       sku.stock_address_id,
       characteristic_ids,
       characteristics_options_ids,
       numeric_values,
       string_values
from sku
         LEFT JOIN (SELECT sc.sku_id,
                           array_agg(sc.characteristic_id)        as characteristic_ids,
                           array_agg(sc.characteristic_option_id) as characteristics_options_ids,
                           array_agg(sc.numeric_value)            as numeric_values,
                           array_agg(sc.string_value)             as string_values
                    FROM sku_characteristics sc
                    GROUP BY sc.sku_id) sc ON sc.sku_id = sku.id

order by sku.id desc
LIMIT 10 OFFSET 0;

在我的测试数据库中,这个查询大约需要 9-10 秒。如果我尝试按created_at列排序,查询将立即加速并花费大约 100 毫秒。

为什么会这样?毕竟,按主键排序肯定会更快。此外,如果没有窗口函数,我们按哪个字段排序也没关系。

created_at按列排序时的查询计划

Limit  (cost=4107.78..4107.80 rows=10 width=269) (actual time=90.529..90.534 rows=10 loops=1)
  ->  Sort  (cost=4107.78..4137.17 rows=11759 width=269) (actual time=90.528..90.531 rows=10 loops=1)
        Sort Key: sku.created_at DESC
        Sort Method: top-N heapsort  Memory: 27kB
        ->  WindowAgg  (cost=504.87..3853.67 rows=11759 width=269) (actual time=71.593..79.641 rows=11759 loops=1)
              ->  Hash Right Join  (cost=504.87..3706.68 rows=11759 width=261) (actual time=10.056..53.764 rows=11759 loops=1)
                    Hash Cond: (sc.sku_id = sku.id)
                    ->  GroupAggregate  (cost=0.29..3055.29 rows=11628 width=136) (actual time=0.029..31.687 rows=11729 loops=1)
                          Group Key: sc.sku_id
                          ->  Index Scan using sku_unique_characteristics on sku_characteristics sc  (cost=0.29..2375.84 rows=35751 width=45) (actual time=0.017..7.991 rows=35751 loops=1)
                    ->  Hash  (cost=357.59..357.59 rows=11759 width=133) (actual time=9.968..9.968 rows=11759 loops=1)
                          Buckets: 16384  Batches: 1  Memory Usage: 2058kB
                          ->  Seq Scan on sku  (cost=0.00..357.59 rows=11759 width=133) (actual time=0.006..3.999 rows=11759 loops=1)
Planning Time: 0.211 ms
Execution Time: 91.218 ms

排序时id

Limit  (cost=0.57..1748.03 rows=10 width=261) (actual time=9309.041..9309.050 rows=10 loops=1)
  ->  WindowAgg  (cost=0.57..2054832.71 rows=11759 width=261) (actual time=9309.040..9309.047 rows=10 loops=1)
        ->  Nested Loop Left Join  (cost=0.57..2054685.72 rows=11759 width=253) (actual time=42.877..9282.697 rows=11759 loops=1)
              Join Filter: (sc.sku_id = sku.id)
              Rows Removed by Join Filter: 69130726
              ->  Index Scan Backward using sku_pkey on sku  (cost=0.29..654.69 rows=11759 width=125) (actual time=0.056..9.662 rows=11759 loops=1)
              ->  Materialize  (cost=0.29..3229.71 rows=11628 width=136) (actual time=0.000..0.256 rows=5880 loops=11759)
                    ->  Subquery Scan on sc  (cost=0.29..3171.57 rows=11628 width=136) (actual time=0.046..32.190 rows=11729 loops=1)
                          ->  GroupAggregate  (cost=0.29..3055.29 rows=11628 width=136) (actual time=0.044..30.268 rows=11729 loops=1)
                                Group Key: sc_1.sku_id
                                ->  Index Scan using sku_unique_characteristics on sku_characteristics sc_1  (cost=0.29..2375.84 rows=35751 width=45) (actual time=0.022..7.330 rows=35751 loops=1)
Planning Time: 0.204 ms
Execution Time: 9310.547 ms

查询计划完全不同。为什么 postgres 针对这两种情况选择不同的策略呢?

我正在使用最新的 Postgresql 13.2

编辑

计划什么时候sort by id asc

Limit  (cost=4.10..7.63 rows=10 width=261) (actual time=57.561..57.570 rows=10 loops=1)
"  Output: (count(*) OVER (?)), sku.item_id, sku.id, sku.seller_sku_id, sku.updated_at, sku.price, sku.original_price, sku.weight, sku.title, sku.seller_id, sku.legal_seller, sku.quantity, sku.min_order_size, sku.max_order_size, sku.pack_type_id, sku.unit_id, sku.stock_address_id, (array_agg(sc.characteristic_id)), (array_agg(sc.characteristic_option_id)), (array_agg(sc.numeric_value)), (array_agg(sc.string_value))"
  ->  WindowAgg  (cost=0.57..4148.00 rows=11759 width=261) (actual time=57.543..57.562 rows=20 loops=1)
"        Output: count(*) OVER (?), sku.item_id, sku.id, sku.seller_sku_id, sku.updated_at, sku.price, sku.original_price, sku.weight, sku.title, sku.seller_id, sku.legal_seller, sku.quantity, sku.min_order_size, sku.max_order_size, sku.pack_type_id, sku.unit_id, sku.stock_address_id, (array_agg(sc.characteristic_id)), (array_agg(sc.characteristic_option_id)), (array_agg(sc.numeric_value)), (array_agg(sc.string_value))"
        ->  Merge Left Join  (cost=0.57..4001.01 rows=11759 width=253) (actual time=0.063..43.164 rows=11759 loops=1)
"              Output: sku.item_id, sku.id, sku.seller_sku_id, sku.updated_at, sku.price, sku.original_price, sku.weight, sku.title, sku.seller_id, sku.legal_seller, sku.quantity, sku.min_order_size, sku.max_order_size, sku.pack_type_id, sku.unit_id, sku.stock_address_id, (array_agg(sc.characteristic_id)), (array_agg(sc.characteristic_option_id)), (array_agg(sc.numeric_value)), (array_agg(sc.string_value))"
              Inner Unique: true
              Merge Cond: (sku.id = sc.sku_id)
              ->  Index Scan using sku_pkey on public.sku  (cost=0.29..654.69 rows=11759 width=125) (actual time=0.027..2.909 rows=11759 loops=1)
"                    Output: sku.id, sku.item_id, sku.seller_id, sku.updated_at, sku.price, sku.weight, sku.quantity, sku.min_order_size, sku.pack_type_id, sku.unit_id, sku.legal_seller, sku.created_at, sku.title, sku.seller_sku_id, sku.original_price, sku.max_order_size, sku.stock_address_id"
              ->  GroupAggregate  (cost=0.29..3055.29 rows=11628 width=136) (actual time=0.026..31.711 rows=11729 loops=1)
"                    Output: sc.sku_id, array_agg(sc.characteristic_id), array_agg(sc.characteristic_option_id), array_agg(sc.numeric_value), array_agg(sc.string_value)"
                    Group Key: sc.sku_id
                    ->  Index Scan using sku_unique_characteristics on public.sku_characteristics sc  (cost=0.29..2375.84 rows=35751 width=45) (actual time=0.014..6.786 rows=35751 loops=1)
"                          Output: sc.id, sc.sku_id, sc.item_id, sc.characteristic_id, sc.characteristic_option_id, sc.numeric_value, sc.string_value"
Planning Time: 0.230 ms
Execution Time: 58.503 ms

标签: sqlpostgresqlquery-optimization

解决方案


优化器错误地认为使用索引扫描和嵌套循环连接比显式排序更便宜。

您可以通过修改子句来阻止它选择该策略,ORDER BY以便无法使用该策略:

... ORDER BY sku-id + 0

推荐阅读