sql - PostgreSQL:使用窗口函数计数(*)超过()时按主键排序缓慢
问题描述
我的表,它们存在于查询中
create table sku
(
id bigserial primary key,
item_id bigint not null
constraint sku_item_id_fkey
references items,
seller_id bigint not null,
updated_at timestamp with time zone default now() not null,
price numeric(19, 2) not null,
weight numeric(19, 2),
quantity integer not null,
min_order_size integer default 0 not null,
pack_type_id bigint
constraint sku_pack_type_id_fkey
references pack_types,
unit_id bigint
constraint sku_unit_id_fkey
references units,
legal_seller boolean not null,
created_at timestamp with time zone default now() not null,
title jsonb default '{}'::jsonb not null,
seller_sku_id varchar default gen_random_uuid() not null,
original_price numeric(19, 2) not null,
max_order_size integer,
stock_address_id bigint,
constraint seller_id_sku_id_constraint
unique (seller_id, seller_sku_id),
constraint sku_check
check (original_price >= price)
);
create table sku_characteristics
(
id bigserial primary key,
sku_id bigint not null,
item_id bigint not null,
characteristic_id bigint not null
constraint sku_characteristics_characteristic_id_fkey
references characteristics,
characteristic_option_id bigint
constraint sku_characteristics_characteristic_option_id_fkey
references characteristic_options,
numeric_value numeric(19, 6),
string_value jsonb,
constraint sku_unique_characteristics
unique (sku_id, characteristic_id),
constraint sku_characteristics_item_id_fkey
foreign key (sku_id, item_id) references sku (id, item_id)
on update cascade on delete cascade
);
在此查询中,我试图获取具有所有特征的分页 SKU 列表。为了找出构建分页菜单的 SKU 数量而不进行单独查询,我使用了窗口函数count (*) over ()
。我按降序排序id
。
explain analyse select
count(*) over () as total_count,
sku.item_id,
sku.id,
sku.seller_sku_id,
sku.updated_at,
sku.price,
sku.original_price,
sku.weight,
sku.title,
sku.seller_id,
sku.legal_seller,
sku.quantity,
sku.min_order_size,
sku.max_order_size,
sku.pack_type_id,
sku.unit_id,
sku.stock_address_id,
characteristic_ids,
characteristics_options_ids,
numeric_values,
string_values
from sku
LEFT JOIN (SELECT sc.sku_id,
array_agg(sc.characteristic_id) as characteristic_ids,
array_agg(sc.characteristic_option_id) as characteristics_options_ids,
array_agg(sc.numeric_value) as numeric_values,
array_agg(sc.string_value) as string_values
FROM sku_characteristics sc
GROUP BY sc.sku_id) sc ON sc.sku_id = sku.id
order by sku.id desc
LIMIT 10 OFFSET 0;
在我的测试数据库中,这个查询大约需要 9-10 秒。如果我尝试按created_at
列排序,查询将立即加速并花费大约 100 毫秒。
为什么会这样?毕竟,按主键排序肯定会更快。此外,如果没有窗口函数,我们按哪个字段排序也没关系。
created_at
按列排序时的查询计划
Limit (cost=4107.78..4107.80 rows=10 width=269) (actual time=90.529..90.534 rows=10 loops=1)
-> Sort (cost=4107.78..4137.17 rows=11759 width=269) (actual time=90.528..90.531 rows=10 loops=1)
Sort Key: sku.created_at DESC
Sort Method: top-N heapsort Memory: 27kB
-> WindowAgg (cost=504.87..3853.67 rows=11759 width=269) (actual time=71.593..79.641 rows=11759 loops=1)
-> Hash Right Join (cost=504.87..3706.68 rows=11759 width=261) (actual time=10.056..53.764 rows=11759 loops=1)
Hash Cond: (sc.sku_id = sku.id)
-> GroupAggregate (cost=0.29..3055.29 rows=11628 width=136) (actual time=0.029..31.687 rows=11729 loops=1)
Group Key: sc.sku_id
-> Index Scan using sku_unique_characteristics on sku_characteristics sc (cost=0.29..2375.84 rows=35751 width=45) (actual time=0.017..7.991 rows=35751 loops=1)
-> Hash (cost=357.59..357.59 rows=11759 width=133) (actual time=9.968..9.968 rows=11759 loops=1)
Buckets: 16384 Batches: 1 Memory Usage: 2058kB
-> Seq Scan on sku (cost=0.00..357.59 rows=11759 width=133) (actual time=0.006..3.999 rows=11759 loops=1)
Planning Time: 0.211 ms
Execution Time: 91.218 ms
排序时id
Limit (cost=0.57..1748.03 rows=10 width=261) (actual time=9309.041..9309.050 rows=10 loops=1)
-> WindowAgg (cost=0.57..2054832.71 rows=11759 width=261) (actual time=9309.040..9309.047 rows=10 loops=1)
-> Nested Loop Left Join (cost=0.57..2054685.72 rows=11759 width=253) (actual time=42.877..9282.697 rows=11759 loops=1)
Join Filter: (sc.sku_id = sku.id)
Rows Removed by Join Filter: 69130726
-> Index Scan Backward using sku_pkey on sku (cost=0.29..654.69 rows=11759 width=125) (actual time=0.056..9.662 rows=11759 loops=1)
-> Materialize (cost=0.29..3229.71 rows=11628 width=136) (actual time=0.000..0.256 rows=5880 loops=11759)
-> Subquery Scan on sc (cost=0.29..3171.57 rows=11628 width=136) (actual time=0.046..32.190 rows=11729 loops=1)
-> GroupAggregate (cost=0.29..3055.29 rows=11628 width=136) (actual time=0.044..30.268 rows=11729 loops=1)
Group Key: sc_1.sku_id
-> Index Scan using sku_unique_characteristics on sku_characteristics sc_1 (cost=0.29..2375.84 rows=35751 width=45) (actual time=0.022..7.330 rows=35751 loops=1)
Planning Time: 0.204 ms
Execution Time: 9310.547 ms
查询计划完全不同。为什么 postgres 针对这两种情况选择不同的策略呢?
我正在使用最新的 Postgresql 13.2
编辑
计划什么时候sort by id asc
Limit (cost=4.10..7.63 rows=10 width=261) (actual time=57.561..57.570 rows=10 loops=1)
" Output: (count(*) OVER (?)), sku.item_id, sku.id, sku.seller_sku_id, sku.updated_at, sku.price, sku.original_price, sku.weight, sku.title, sku.seller_id, sku.legal_seller, sku.quantity, sku.min_order_size, sku.max_order_size, sku.pack_type_id, sku.unit_id, sku.stock_address_id, (array_agg(sc.characteristic_id)), (array_agg(sc.characteristic_option_id)), (array_agg(sc.numeric_value)), (array_agg(sc.string_value))"
-> WindowAgg (cost=0.57..4148.00 rows=11759 width=261) (actual time=57.543..57.562 rows=20 loops=1)
" Output: count(*) OVER (?), sku.item_id, sku.id, sku.seller_sku_id, sku.updated_at, sku.price, sku.original_price, sku.weight, sku.title, sku.seller_id, sku.legal_seller, sku.quantity, sku.min_order_size, sku.max_order_size, sku.pack_type_id, sku.unit_id, sku.stock_address_id, (array_agg(sc.characteristic_id)), (array_agg(sc.characteristic_option_id)), (array_agg(sc.numeric_value)), (array_agg(sc.string_value))"
-> Merge Left Join (cost=0.57..4001.01 rows=11759 width=253) (actual time=0.063..43.164 rows=11759 loops=1)
" Output: sku.item_id, sku.id, sku.seller_sku_id, sku.updated_at, sku.price, sku.original_price, sku.weight, sku.title, sku.seller_id, sku.legal_seller, sku.quantity, sku.min_order_size, sku.max_order_size, sku.pack_type_id, sku.unit_id, sku.stock_address_id, (array_agg(sc.characteristic_id)), (array_agg(sc.characteristic_option_id)), (array_agg(sc.numeric_value)), (array_agg(sc.string_value))"
Inner Unique: true
Merge Cond: (sku.id = sc.sku_id)
-> Index Scan using sku_pkey on public.sku (cost=0.29..654.69 rows=11759 width=125) (actual time=0.027..2.909 rows=11759 loops=1)
" Output: sku.id, sku.item_id, sku.seller_id, sku.updated_at, sku.price, sku.weight, sku.quantity, sku.min_order_size, sku.pack_type_id, sku.unit_id, sku.legal_seller, sku.created_at, sku.title, sku.seller_sku_id, sku.original_price, sku.max_order_size, sku.stock_address_id"
-> GroupAggregate (cost=0.29..3055.29 rows=11628 width=136) (actual time=0.026..31.711 rows=11729 loops=1)
" Output: sc.sku_id, array_agg(sc.characteristic_id), array_agg(sc.characteristic_option_id), array_agg(sc.numeric_value), array_agg(sc.string_value)"
Group Key: sc.sku_id
-> Index Scan using sku_unique_characteristics on public.sku_characteristics sc (cost=0.29..2375.84 rows=35751 width=45) (actual time=0.014..6.786 rows=35751 loops=1)
" Output: sc.id, sc.sku_id, sc.item_id, sc.characteristic_id, sc.characteristic_option_id, sc.numeric_value, sc.string_value"
Planning Time: 0.230 ms
Execution Time: 58.503 ms
解决方案
优化器错误地认为使用索引扫描和嵌套循环连接比显式排序更便宜。
您可以通过修改子句来阻止它选择该策略,ORDER BY
以便无法使用该策略:
... ORDER BY sku-id + 0
推荐阅读
- python - 根据鼠标按下事件更改图表条的颜色
- jquery - 如果它与当前网址匹配,则将类添加到当前链接?
- json - 如何在使用 Scala-Play Json 框架解析 Json 时获取键和值?
- qt - 在我的项目中单击运行时,我收到“无法启动程序。路径或权限错误?”
- oracle - webutil如何将word文件读入oracle表单数据块
- linux - 如何在单个 linux 服务器中安装多个 CouchBase 实例?
- c# - 如何启用 CAS 仪表板(“/status/dashboard”端点)
- elasticsearch - 在弹性搜索中添加操作数和到相位前缀查询类型
- java - 分配给 ThreadPoolExecutor 的静态变量
- sql - 使用 SQL 查询使用逗号分隔值的 Like 运算符