首页 > 解决方案 > 为什么 Postgres 多列分区修剪比这更聪明?

问题描述

在 Postgres (v12) 中,我们有一些按“国家”和“部门”分区的大表,但我们发现修剪行为仅对确定每个确切值的查询是最佳的,当 2或更多的值被指定。此外,修剪逻辑中以某种方式偏爱“国家”列,大概是因为它是分区键中的第一列......

例子

CREATE TABLE part.partitioned_table
(
    country character varying NOT NULL,
    sector character varying NOT NULL,
    a_value integer NOT NULL,
    other_value integer,
    CONSTRAINT partitioned_table_pkey PRIMARY KEY (country, sector, a_value)
) PARTITION BY RANGE (country, sector, a_value);

CREATE TABLE part.partitioned_table_gb_alpha PARTITION OF part.partitioned_table
    FOR VALUES FROM ('GB', 'ALPHA', MINVALUE) TO ('GB', 'ALPHA', MAXVALUE);
CREATE TABLE part.partitioned_table_gb_beta PARTITION OF part.partitioned_table
    FOR VALUES FROM ('GB', 'BETA', MINVALUE) TO ('GB', 'BETA', MAXVALUE);
CREATE TABLE part.partitioned_table_gb_gamma PARTITION OF part.partitioned_table
    FOR VALUES FROM ('GB', 'GAMMA', MINVALUE) TO ('GB', 'GAMMA', MAXVALUE);
CREATE TABLE part.partitioned_table_fr_alpha PARTITION OF part.partitioned_table
    FOR VALUES FROM ('FR', 'ALPHA', MINVALUE) TO ('FR', 'ALPHA', MAXVALUE);
CREATE TABLE part.partitioned_table_fr_beta PARTITION OF part.partitioned_table
    FOR VALUES FROM ('FR', 'BETA', MINVALUE) TO ('FR', 'BETA', MAXVALUE);
CREATE TABLE part.partitioned_table_fr_gamma PARTITION OF part.partitioned_table
    FOR VALUES FROM ('FR', 'GAMMA', MINVALUE) TO ('FR', 'GAMMA', MAXVALUE);
CREATE TABLE part.partitioned_table_de_alpha PARTITION OF part.partitioned_table
    FOR VALUES FROM ('DE', 'ALPHA', MINVALUE) TO ('DE', 'ALPHA', MAXVALUE);
CREATE TABLE part.partitioned_table_de_beta PARTITION OF part.partitioned_table
    FOR VALUES FROM ('DE', 'BETA', MINVALUE) TO ('DE', 'BETA', MAXVALUE);
CREATE TABLE part.partitioned_table_de_gamma PARTITION OF part.partitioned_table
    FOR VALUES FROM ('DE', 'GAMMA', MINVALUE) TO ('DE', 'GAMMA', MAXVALUE);

INSERT INTO part.partitioned_table(country, sector, a_value, other_value) VALUES
    ('GB', 'ALPHA', 10, 1000),
    ('GB', 'BETA', 10, 1000),
    ('GB', 'GAMMA', 10, 1000),
    ('FR', 'ALPHA', 10, 1000),
    ('FR', 'BETA', 10, 1000),
    ('FR', 'GAMMA', 10, 1000),
    ('DE', 'ALPHA', 10, 1000),
    ('DE', 'BETA', 10, 1000),
    ('DE', 'GAMMA', 10, 1000);

-- query plan for this statement shows that only a single partition is scanned as expected (partitioned_table_gb_beta)
EXPLAIN ANALYSE SELECT * FROM part.partitioned_table WHERE country = 'GB' and sector = 'BETA';

-- adding a sector to the where clause causes ALL 'GB' partitions to be scanned
EXPLAIN ANALYSE SELECT * FROM part.partitioned_table WHERE country = 'GB' and sector in ('BETA', 'GAMMA');

-- instead adding a country to the where clause causes ALL 'GB' and 'FR' partitions to be scanned!
EXPLAIN ANALYSE SELECT * FROM part.partitioned_table WHERE country in ('GB', 'FR') and sector = 'BETA';

请注意,即使我使用“OR”逻辑而不是“IN”,也会扫描相同的分区。如果需要,我可以添加explain analyse输出。

那么首先,为什么仅仅因为指定了多个扇区就扫描所有“GB”分区?

其次,也许更奇怪的是,如果我坚持一个扇区(在我的示例中为“BETA”)并添加第二个国家,而不是扫描所有“BETA”分区,它现在扫描每个指定国家的所有分区。

显然,现实世界的表是数百个国家和部门。我们有使用案例来查询跨多个国家/地区的单个扇区(例如,假设 20 个),并最终扫描数百个分区(这 20 个国家/地区的所有扇区分区),此时只需要扫描 20 个分区,这是“显式"在查询中。

我们是否需要创建另一个主要按扇区分区的表版本来克服这个问题,还是我们在这里遗漏了其他东西?

标签: postgresqlpartitioningdatabase-partitioning

解决方案


子分区的使用使查询计划器的行为更加明智。由于这也允许我们partition by list,因此无需人为地将整数列包含在分区键中以进行范围:

CREATE TABLE part.sub_partitioned_table
(
    country character varying NOT NULL,
    sector character varying NOT NULL,
    a_value integer NOT NULL,
    other_value integer,
    CONSTRAINT sub_partitioned_table_pkey PRIMARY KEY (country, sector, a_value)
) PARTITION BY LIST (country);

CREATE TABLE part.sub_partitioned_table_de PARTITION OF part.sub_partitioned_table
    FOR VALUES IN ('DE')
    PARTITION BY LIST (sector);
CREATE TABLE part.sub_partitioned_table_gb PARTITION OF part.sub_partitioned_table
    FOR VALUES IN ('GB')
    PARTITION BY LIST (sector);
CREATE TABLE part.sub_partitioned_table_fr PARTITION OF part.sub_partitioned_table
    FOR VALUES IN ('FR')
    PARTITION BY LIST (sector);

CREATE TABLE part.sub_partitioned_table_gb_alpha PARTITION OF part.sub_partitioned_table_gb
    FOR VALUES IN ('ALPHA');
CREATE TABLE part.sub_partitioned_table_gb_beta PARTITION OF part.sub_partitioned_table_gb
     FOR VALUES IN ('BETA');
CREATE TABLE part.sub_partitioned_table_gb_gamma PARTITION OF part.sub_partitioned_table_gb
    FOR VALUES IN ('GAMMA');
CREATE TABLE part.sub_partitioned_table_fr_alpha PARTITION OF part.sub_partitioned_table_fr
    FOR VALUES IN ('ALPHA');
CREATE TABLE part.sub_partitioned_table_fr_beta PARTITION OF part.sub_partitioned_table_fr
     FOR VALUES IN ('BETA');
CREATE TABLE part.sub_partitioned_table_fr_gamma PARTITION OF part.sub_partitioned_table_fr
    FOR VALUES IN ('GAMMA');    
CREATE TABLE part.sub_partitioned_table_de_alpha PARTITION OF part.sub_partitioned_table_de
    FOR VALUES IN ('ALPHA');
CREATE TABLE part.sub_partitioned_table_de_beta PARTITION OF part.sub_partitioned_table_de
     FOR VALUES IN ('BETA');
CREATE TABLE part.sub_partitioned_table_de_gamma PARTITION OF part.sub_partitioned_table_de
    FOR VALUES IN ('GAMMA');

INSERT INTO part.sub_partitioned_table(country, sector, a_value, other_value) VALUES
    ('GB', 'ALPHA', 10, 1000),
    ('GB', 'BETA', 10, 1000),
    ('GB', 'GAMMA', 10, 1000),
    ('FR', 'ALPHA', 10, 1000),
    ('FR', 'BETA', 10, 1000),
    ('FR', 'GAMMA', 10, 1000),
    ('DE', 'ALPHA', 10, 1000),
    ('DE', 'BETA', 10, 1000),
    ('DE', 'GAMMA', 10, 1000);
    
    
EXPLAIN ANALYSE SELECT * FROM part.sub_partitioned_table WHERE country = 'GB' and sector = 'BETA';

EXPLAIN ANALYSE SELECT * FROM part.sub_partitioned_table WHERE country = 'GB' and sector in ('ALPHA', 'BETA');
EXPLAIN ANALYSE SELECT * FROM part.sub_partitioned_table WHERE country = 'GB' and (sector = 'ALPHA' or sector = 'BETA');

EXPLAIN ANALYSE SELECT * FROM part.sub_partitioned_table WHERE country in ('GB', 'FR') and sector = 'BETA';
EXPLAIN ANALYSE SELECT * FROM part.sub_partitioned_table WHERE (country = 'GB' or country = 'FR') and sector = 'BETA';

EXPLAIN ANALYSE SELECT * FROM part.sub_partitioned_table WHERE
((sector = 'BETA' AND country = 'GB') OR
 (sector = 'BETA' AND country = 'FR'))
 
EXPLAIN ANALYSE SELECT * FROM part.sub_partitioned_table WHERE
((sector = 'BETA' AND country = 'GB') OR
 (sector = 'ALPHA' AND country = 'GB'))

在所有情况下,查询计划都显示仅扫描相关(子)分区。


推荐阅读