首页 > 解决方案 > PostgreSQL 数据库分区没有按我的预期工作

问题描述

我有一个包含 Ip Location 数据库的表。这是结构。

结构

Here is create script;

-- public."IpLocation" definition

-- Drop table

-- DROP TABLE public."IpLocation";

CREATE TABLE public."IpLocation" (
    "Id" int4 NOT NULL DEFAULT nextval('"IpLocation_Id_seq"'::regclass),
    "IpFrom" float8 NOT NULL,
    "IpTo" float8 NOT NULL,
    "CountryCode" text NULL,
    "CountryName" text NULL,
    "RegionName" text NULL,
    "CityName" text NULL,
    "CreatedAt" timestamp NULL,
    "DeletedAt" timestamp NULL,
    "UpdatedAt" timestamp NULL,
    CONSTRAINT "IpLocation_pkey" PRIMARY KEY ("Id","IpFrom")
)
PARTITION BY RANGE ("IpFrom");
CREATE INDEX "IX_IpLocation_IpFrom" ON public."IpLocation" ("IpFrom");
CREATE INDEX "IX_IpLocation_IpFrom_IpTo" ON public."IpLocation" ("IpFrom","IpFrom");
CREATE INDEX "IX_IpLocation_IpTo" ON public."IpLocation" ("IpFrom");

-- public."IpLocation" foreign keys

我在这张表中有 3.045.203 条记录。一些查询运行缓慢。这就是为什么我决定对表进行分区。

这是我用于分区的sql;

CREATE TABLE public.iplocationpartition PARTITION OF public."IpLocation" default;
CREATE TABLE public.iplocationpartition1 PARTITION OF public."IpLocation" FOR VALUES  from (-1) to (751619277);
CREATE TABLE public.iplocationpartition2 PARTITION OF public."IpLocation" FOR VALUES  from (751619278) to (1503238554);
CREATE TABLE public.iplocationpartition3 PARTITION OF public."IpLocation" FOR VALUES  from (1503238555) to (2254857831);
CREATE TABLE public.iplocationpartition4 PARTITION OF public."IpLocation" FOR VALUES  from (2254857832) to (3006477108);
CREATE TABLE public.iplocationpartition5 PARTITION OF public."IpLocation" FOR VALUES  from (3006477109) to (MAXVALUE);

但是当我运行这个查询得到下面的结果

EXPLAIN analyse select * from "IpLocation" where "IpFrom"<=1503395841 and "IpTo">=1503395841 limit 1;

Limit  (cost=0.00..204.15 rows=1 width=107) (actual time=125.634..125.636 rows=1 loops=1)
  ->  Append  (cost=0.00..34705.86 rows=170 width=107) (actual time=125.633..125.634 rows=1 loops=1)
        ->  Seq Scan on iplocationpartition1  (cost=0.00..9093.47 rows=33 width=83) (actual time=32.500..32.500 rows=0 loops=1)
              Filter: (("IpFrom" <= '1503395841'::double precision) AND ("IpTo" >= '1503395841'::double precision))
              Rows Removed by Filter: 325098
        ->  Seq Scan on iplocationpartition2  (cost=0.00..25587.10 rows=92 width=83) (actual time=93.117..93.117 rows=0 loops=1)
              Filter: (("IpFrom" <= '1503395841'::double precision) AND ("IpTo" >= '1503395841'::double precision))
              Rows Removed by Filter: 915340
        ->  Index Scan using "iplocationpartition3_IpFrom_idx1" on iplocationpartition3  (cost=0.42..8.45 rows=1 width=80) (actual time=0.012..0.013 rows=1 loops=1)
              Index Cond: ("IpFrom" <= '1503395841'::double precision)
              Filter: ("IpTo" >= '1503395841'::double precision)
        ->  Seq Scan on iplocationpartition  (cost=0.00..16.00 rows=44 width=172) (never executed)
              Filter: (("IpFrom" <= '1503395841'::double precision) AND ("IpTo" >= '1503395841'::double precision))
Planning Time: 0.301 ms
Execution Time: 125.663 ms

我正在使用相同的数据运行相同的查询,但没有分区表。

EXPLAIN analyse select * from "IpLocationF" where "IpFrom"<=1503395841 and "IpTo">=1503395841 limit 1;

Limit  (cost=0.00..0.11 rows=1 width=82) (actual time=0.081..0.082 rows=1 loops=1)
  ->  Seq Scan on "IpLocationF"  (cost=0.00..84328.04 rows=738534 width=82) (actual time=0.081..0.081 rows=1 loops=1)
        Filter: (("IpFrom" <= '1503395841'::double precision) AND ("IpTo" >= '1503395841'::double precision))
        Rows Removed by Filter: 747
Planning Time: 0.087 ms
Execution Time: 0.094 ms

如您所见,非分区表更快。

它怎么会发生?我做错了什么?

标签: sqlpostgresqldatabase-partitioning

解决方案


怎么会发生?倒霉的运气。您对未分区表的查询在仅搜索 747 行后找到了它正在寻找的行,然后由于 LIMIT 1 而可以停止。没有解释(您已与我们分享)为什么它应该是这种方式。所以这似乎只是运气。

分区表没有那么幸运。但是搜索不同的参数会显示不同程度的运气。

您可以通过在“IpTo”上添加索引来极大地改进分区表的情况。假设所有存储的范围都很窄,这应该能够通过使用该索引在其中找到零个可能匹配的行来非常快速地处理分区 1 和 2。

但实际上,您可能根本不应该为此使用分区。您可以使用 GiST 索引直接索引范围。您可以将范围直接存储在表中,或者您可以使用功能索引根据您已经存储的内容来索引范围。当它们不应该存储为浮点数时,您会有额外的复杂性,因此这使它变得更加有趣。

create index on "IpLocationF" using gist (int8range("IpFrom"::bigint,"IpTo"::bigint,'[]'));
select * from "IpLocationF" where int8range("IpFrom"::int8,"IpTo"::int8,'[]') @> 1503395841::bigint limit 1;

推荐阅读