postgresql - 为什么 Postgres 多列分区修剪比这更聪明?
问题描述
在 Postgres (v12) 中,我们有一些按“国家”和“部门”分区的大表,但我们发现修剪行为仅对确定每个确切值的查询是最佳的,当 2或更多的值被指定。此外,修剪逻辑中以某种方式偏爱“国家”列,大概是因为它是分区键中的第一列......
例子
CREATE TABLE part.partitioned_table
(
country character varying NOT NULL,
sector character varying NOT NULL,
a_value integer NOT NULL,
other_value integer,
CONSTRAINT partitioned_table_pkey PRIMARY KEY (country, sector, a_value)
) PARTITION BY RANGE (country, sector, a_value);
CREATE TABLE part.partitioned_table_gb_alpha PARTITION OF part.partitioned_table
FOR VALUES FROM ('GB', 'ALPHA', MINVALUE) TO ('GB', 'ALPHA', MAXVALUE);
CREATE TABLE part.partitioned_table_gb_beta PARTITION OF part.partitioned_table
FOR VALUES FROM ('GB', 'BETA', MINVALUE) TO ('GB', 'BETA', MAXVALUE);
CREATE TABLE part.partitioned_table_gb_gamma PARTITION OF part.partitioned_table
FOR VALUES FROM ('GB', 'GAMMA', MINVALUE) TO ('GB', 'GAMMA', MAXVALUE);
CREATE TABLE part.partitioned_table_fr_alpha PARTITION OF part.partitioned_table
FOR VALUES FROM ('FR', 'ALPHA', MINVALUE) TO ('FR', 'ALPHA', MAXVALUE);
CREATE TABLE part.partitioned_table_fr_beta PARTITION OF part.partitioned_table
FOR VALUES FROM ('FR', 'BETA', MINVALUE) TO ('FR', 'BETA', MAXVALUE);
CREATE TABLE part.partitioned_table_fr_gamma PARTITION OF part.partitioned_table
FOR VALUES FROM ('FR', 'GAMMA', MINVALUE) TO ('FR', 'GAMMA', MAXVALUE);
CREATE TABLE part.partitioned_table_de_alpha PARTITION OF part.partitioned_table
FOR VALUES FROM ('DE', 'ALPHA', MINVALUE) TO ('DE', 'ALPHA', MAXVALUE);
CREATE TABLE part.partitioned_table_de_beta PARTITION OF part.partitioned_table
FOR VALUES FROM ('DE', 'BETA', MINVALUE) TO ('DE', 'BETA', MAXVALUE);
CREATE TABLE part.partitioned_table_de_gamma PARTITION OF part.partitioned_table
FOR VALUES FROM ('DE', 'GAMMA', MINVALUE) TO ('DE', 'GAMMA', MAXVALUE);
INSERT INTO part.partitioned_table(country, sector, a_value, other_value) VALUES
('GB', 'ALPHA', 10, 1000),
('GB', 'BETA', 10, 1000),
('GB', 'GAMMA', 10, 1000),
('FR', 'ALPHA', 10, 1000),
('FR', 'BETA', 10, 1000),
('FR', 'GAMMA', 10, 1000),
('DE', 'ALPHA', 10, 1000),
('DE', 'BETA', 10, 1000),
('DE', 'GAMMA', 10, 1000);
-- query plan for this statement shows that only a single partition is scanned as expected (partitioned_table_gb_beta)
EXPLAIN ANALYSE SELECT * FROM part.partitioned_table WHERE country = 'GB' and sector = 'BETA';
-- adding a sector to the where clause causes ALL 'GB' partitions to be scanned
EXPLAIN ANALYSE SELECT * FROM part.partitioned_table WHERE country = 'GB' and sector in ('BETA', 'GAMMA');
-- instead adding a country to the where clause causes ALL 'GB' and 'FR' partitions to be scanned!
EXPLAIN ANALYSE SELECT * FROM part.partitioned_table WHERE country in ('GB', 'FR') and sector = 'BETA';
请注意,即使我使用“OR”逻辑而不是“IN”,也会扫描相同的分区。如果需要,我可以添加explain analyse
输出。
那么首先,为什么仅仅因为指定了多个扇区就扫描所有“GB”分区?
其次,也许更奇怪的是,如果我坚持一个扇区(在我的示例中为“BETA”)并添加第二个国家,而不是扫描所有“BETA”分区,它现在扫描每个指定国家的所有分区。
显然,现实世界的表是数百个国家和部门。我们有使用案例来查询跨多个国家/地区的单个扇区(例如,假设 20 个),并最终扫描数百个分区(这 20 个国家/地区的所有扇区分区),此时只需要扫描 20 个分区,这是“显式"在查询中。
我们是否需要创建另一个主要按扇区分区的表版本来克服这个问题,还是我们在这里遗漏了其他东西?
解决方案
子分区的使用使查询计划器的行为更加明智。由于这也允许我们partition by list
,因此无需人为地将整数列包含在分区键中以进行范围:
CREATE TABLE part.sub_partitioned_table
(
country character varying NOT NULL,
sector character varying NOT NULL,
a_value integer NOT NULL,
other_value integer,
CONSTRAINT sub_partitioned_table_pkey PRIMARY KEY (country, sector, a_value)
) PARTITION BY LIST (country);
CREATE TABLE part.sub_partitioned_table_de PARTITION OF part.sub_partitioned_table
FOR VALUES IN ('DE')
PARTITION BY LIST (sector);
CREATE TABLE part.sub_partitioned_table_gb PARTITION OF part.sub_partitioned_table
FOR VALUES IN ('GB')
PARTITION BY LIST (sector);
CREATE TABLE part.sub_partitioned_table_fr PARTITION OF part.sub_partitioned_table
FOR VALUES IN ('FR')
PARTITION BY LIST (sector);
CREATE TABLE part.sub_partitioned_table_gb_alpha PARTITION OF part.sub_partitioned_table_gb
FOR VALUES IN ('ALPHA');
CREATE TABLE part.sub_partitioned_table_gb_beta PARTITION OF part.sub_partitioned_table_gb
FOR VALUES IN ('BETA');
CREATE TABLE part.sub_partitioned_table_gb_gamma PARTITION OF part.sub_partitioned_table_gb
FOR VALUES IN ('GAMMA');
CREATE TABLE part.sub_partitioned_table_fr_alpha PARTITION OF part.sub_partitioned_table_fr
FOR VALUES IN ('ALPHA');
CREATE TABLE part.sub_partitioned_table_fr_beta PARTITION OF part.sub_partitioned_table_fr
FOR VALUES IN ('BETA');
CREATE TABLE part.sub_partitioned_table_fr_gamma PARTITION OF part.sub_partitioned_table_fr
FOR VALUES IN ('GAMMA');
CREATE TABLE part.sub_partitioned_table_de_alpha PARTITION OF part.sub_partitioned_table_de
FOR VALUES IN ('ALPHA');
CREATE TABLE part.sub_partitioned_table_de_beta PARTITION OF part.sub_partitioned_table_de
FOR VALUES IN ('BETA');
CREATE TABLE part.sub_partitioned_table_de_gamma PARTITION OF part.sub_partitioned_table_de
FOR VALUES IN ('GAMMA');
INSERT INTO part.sub_partitioned_table(country, sector, a_value, other_value) VALUES
('GB', 'ALPHA', 10, 1000),
('GB', 'BETA', 10, 1000),
('GB', 'GAMMA', 10, 1000),
('FR', 'ALPHA', 10, 1000),
('FR', 'BETA', 10, 1000),
('FR', 'GAMMA', 10, 1000),
('DE', 'ALPHA', 10, 1000),
('DE', 'BETA', 10, 1000),
('DE', 'GAMMA', 10, 1000);
EXPLAIN ANALYSE SELECT * FROM part.sub_partitioned_table WHERE country = 'GB' and sector = 'BETA';
EXPLAIN ANALYSE SELECT * FROM part.sub_partitioned_table WHERE country = 'GB' and sector in ('ALPHA', 'BETA');
EXPLAIN ANALYSE SELECT * FROM part.sub_partitioned_table WHERE country = 'GB' and (sector = 'ALPHA' or sector = 'BETA');
EXPLAIN ANALYSE SELECT * FROM part.sub_partitioned_table WHERE country in ('GB', 'FR') and sector = 'BETA';
EXPLAIN ANALYSE SELECT * FROM part.sub_partitioned_table WHERE (country = 'GB' or country = 'FR') and sector = 'BETA';
EXPLAIN ANALYSE SELECT * FROM part.sub_partitioned_table WHERE
((sector = 'BETA' AND country = 'GB') OR
(sector = 'BETA' AND country = 'FR'))
EXPLAIN ANALYSE SELECT * FROM part.sub_partitioned_table WHERE
((sector = 'BETA' AND country = 'GB') OR
(sector = 'ALPHA' AND country = 'GB'))
在所有情况下,查询计划都显示仅扫描相关(子)分区。
推荐阅读
- fastapi - Python Fastapi。重定向后没有标题
- arduino - 使用 IFTTT 和 Arduino 物联网云
- r - 跨多个栅格层计算每个像素的分位数概率 0.05
- unix - 将 *nix 命令的结果通过管道传输到 Vim 的打开文件集中
- sql - 在 postgresql 中更新具有行号的表
- nginx - 如何使用 fastcgi_wrap 配置 Nginx?
- javascript - Alpine.js:如何使 ref 元素可见并专注于点击?
- .htaccess - 我无法使用 .htaccess 美化我网站的 URL
- r - 将类 4 转换为 R 中的数据框
- reactjs - 输入类型日期不更新 React