首页 > 解决方案 > 小表上的仅索引扫描非常慢

问题描述

我正在尝试优化以下 PostgreSQL 查询:

SELECT DISTINCT route.id,
                route.ref_origine,
                route.platform
FROM route
         INNER JOIN level ON (route.id IN (
    SELECT level_route.route_id
    FROM level_route
    WHERE level.id = level_route.level_id))
         CROSS JOIN site
WHERE level.ref_site = site.id
  AND site.ref_client = 1
  AND site.visible = TRUE;

为了清楚起见,我重命名了表和字段,但它是由 QueryDSL 生成的。

分析此查询时,我可以看到:

我想提高这个特定查询的性能,因为我们有很多这样的查询要执行,这使我们的工作运行时间过长。

当输入我的一个解释分析结果时,PostgreSQL Explain Visualizer 为我提供了以下信息:

再次运行 EXPLAIN ANALYZE(PEV 没有 JSON 格式,所以它有点不同但类似的计划)给了我以下输出:

HashAggregate  (cost=582383.52..582444.12 rows=6060 width=17) (actual time=2054.753..2054.984 rows=1088 loops=1)
"  Group Key: route.id, route.ref_origine, route.platform"
  ->  Nested Loop  (cost=3.99..581516.26 rows=115635 width=17) (actual time=95.786..2053.788 rows=1230 loops=1)
        Join Filter: (SubPlan 1)
        Rows Removed by Join Filter: 268440
        ->  Seq Scan on route  (cost=0.00..50.30 rows=3030 width=17) (actual time=0.006..1.104 rows=3030 loops=1)
        ->  Materialize  (cost=3.99..9.15 rows=76 width=8) (actual time=0.000..0.007 rows=89 loops=3030)
              ->  Hash Join  (cost=3.99..8.77 rows=76 width=8) (actual time=0.029..0.070 rows=89 loops=1)
                    Hash Cond: (level.ref_site = site.id)
                    ->  Seq Scan on level  (cost=0.00..3.47 rows=147 width=16) (actual time=0.002..0.018 rows=148 loops=1)
                    ->  Hash  (cost=3.65..3.65 rows=27 width=8) (actual time=0.021..0.021 rows=28 loops=1)
                          Buckets: 1024  Batches: 1  Memory Usage: 10kB
                          ->  Seq Scan on site  (cost=0.00..3.65 rows=27 width=8) (actual time=0.005..0.017 rows=28 loops=1)
                                Filter: (visible AND (ref_client = 1))
                                Rows Removed by Filter: 25
        SubPlan 1
          ->  Index Only Scan using level_route_pkey on level_route  (cost=0.28..4.68 rows=23 width=8) (actual time=0.002..0.005 rows=14 loops=269670)
                Index Cond: (level_id = level.id)
                Heap Fetches: 0
Planning time: 0.221 ms
Execution time: 2055.094 ms

据我所知,问题似乎出在此复制/粘贴计划的最后一部分:表 level_route 的索引仅扫描。

确实,level_route 是一个有 3030 行的表,只包含两个字段:level_id 和 route_id。常规关联表。主键 level_route_pkey 是 (level_id, route_id) 所以我们在这个字段组合上有一个索引,它用于仅索引扫描。虽然这看起来合乎逻辑且聪明(避免了无用的堆提取),但为什么对包含两个基本字段的 3030 行表的仅索引扫描需要超过一秒的时间才能执行?

这里有一些我不明白的地方。寻找任何建议/建议以改进此查询执行时间和性能。

以下是表和相关索引定义:

路由表

id          bigint                not null,
ref_origine bigint,
platform    boolean default false not null,
constraint route_pkey
    primary key (id),
constraint fk6o50xseq5lpmpw8mo8de5nylu
    foreign key (ref_origine) references point

水平表

id          bigint                      not null,
altitude    integer           default 0 not null,
elevation   integer                     not null,
name        varchar(255)                not null,
svg         text                        not null,
ref_site    bigint,
description varchar(10485760) default ''::character varying,
type        varchar(50)       default NULL::character varying,
constraint level_pkey
    primary key (id),
constraint unique_elevation_for_level_in_site
    unique (ref_site, elevation),
constraint fkl4mlang0ahx72vnrp1slop579
    foreign key (ref_site) references site

level_route 表

level_id bigint not null,
route_id bigint not null,
constraint level_route_pkey
    primary key (level_id, route_id),
constraint fkd0kfn8u2jfqcg46li5aw2mp5n
    foreign key (level_id) references level,
constraint fkm1to01mwy537dxb7vu8qk9ut1
    foreign key (route_id) references route

create index if not exists level_route_level_id_index
    on level_route (level_id);

站点表

id          bigint                                          not null,
description varchar(255),
name        varchar(255)                                    not null,
naoid       bigint,
naotoken    varchar(255),
version     integer      default 0                          not null,
ref_client  bigint,
uuid        varchar(255) default ''::character varying      not null,
deployement varchar(50)  default 'Never'::character varying not null,
deployedat  timestamp,
visible     boolean      default false                      not null,
constraint site_pkey
    primary key (id),
constraint fkhqoxjhhr0ud7i5cvu9nhwc62k
    foreign key (ref_client) references client

非常感谢任何愿意花时间在这里帮助我的人。

标签: postgresqlquery-optimization

解决方案


按照您编写查询的方式,PostgreSQL 必须为连接中的每一对行执行子查询。仅索引扫描执行得非常快,但过于频繁。

以合理的方式重写查询:

SELECT DISTINCT
       route.id,
       route.ref_origine,
       route.platform
FROM route
   INNER JOIN FROM level_route
      ON route.id = level_route.route_id
   INNER JOIN level
      ON level.id = level_route.level_id
   INNER JOIN site
      ON level.ref_site = site.id
WHERE site.ref_client = 1
  AND site.visible;

推荐阅读