首页 > 解决方案 > 为什么 postresql 不使用多列索引中的所有列?

问题描述

我正在使用扩展程序

CREATE EXTENSION btree_gin;

我有一个看起来像这样的索引...

create index boundaries2 on rets USING GIN(source, isonlastsync, status, (geoinfo::jsonb->'boundaries'), ctcvalidto, searchablePrice, ctcSortOrder);

在我开始弄乱它之前,索引看起来像这样,与我将要分享的结果相同,所以索引定义中的微小变化似乎没有什么区别:

create index boundaries on rets USING GIN((geoinfo::jsonb->'boundaries'), source, status, isonlastsync, ctcvalidto, searchablePrice, ctcSortOrder);

我给 pgsql 11 这个查询:

explain analyze select id from rets where ((geoinfo::jsonb->'boundaries' ?| array['High School: Torrey Pines']) AND source='SDMLS'
          AND searchablePrice>=800000 AND searchablePrice<=1200000 AND YrBlt>=2000 AND EstSF>=2300
         AND Beds>=3 AND FB>=2 AND ctcSortOrder>'2019-07-05 16:02:54 UTC' AND Status IN ('ACTIVE','BACK ON MARKET')
         AND ctcvalidto='9999-12-31 23:59:59 UTC' AND isonlastsync='true') order by LstDate desc, ctcSortOrder desc LIMIT 3000;

结果...

 Limit  (cost=120.06..120.06 rows=1 width=23) (actual time=472.849..472.850 rows=1 loops=1)
   ->  Sort  (cost=120.06..120.06 rows=1 width=23) (actual time=472.847..472.848 rows=1 loops=1)
         Sort Key: lstdate DESC, ctcsortorder DESC
         Sort Method: quicksort  Memory: 25kB
         ->  Bitmap Heap Scan on rets  (cost=116.00..120.05 rows=1 width=23) (actual time=472.748..472.841 rows=1 loops=1)
               Recheck Cond: ((source = 'SDMLS'::text) AND (((geoinfo)::jsonb -> 'boundaries'::text) ?| '{"High School: Torrey Pines"}'::text[]) AND (ctcvalidto = '9999-12-31 23:59:59+00'::timestamp with time zone) AND (searchableprice >= 800000) AND (searchableprice <= 1200000) AND (ctcsortorder > '2019-07-05 16:02:54+00'::timestamp with time zone))
               Rows Removed by Index Recheck: 93
               Filter: (isonlastsync AND (yrblt >= 2000) AND (estsf >= 2300) AND (beds >= 3) AND (fb >= 2) AND (status = ANY ('{ACTIVE,"BACK ON MARKET"}'::text[])))
               Rows Removed by Filter: 10
               Heap Blocks: exact=102
               ->  Bitmap Index Scan on boundaries2  (cost=0.00..116.00 rows=1 width=0) (actual time=471.762..471.762 rows=104 loops=1)
                     Index Cond: ((source = 'SDMLS'::text) AND (((geoinfo)::jsonb -> 'boundaries'::text) ?| '{"High School: Torrey Pines"}'::text[]) AND (ctcvalidto = '9999-12-31 23:59:59+00'::timestamp with time zone) AND (searchableprice >= 800000) AND (searchableprice <= 1200000) AND (ctcsortorder > '2019-07-05 16:02:54+00'::timestamp with time zone))
 Planning Time: 0.333 ms
 Execution Time: 474.311 ms
(14 rows)

问题

为什么是列statusisonlastsync不是被Bitmap Index Scan on boundaries2

标签: postgresqlindexingdatabase-performancepostgresql-11

解决方案


如果它预测过滤掉这些列会更快,它可以这样做。如果列的基数非常低并且您将获取所有行的足够大的部分,通常会出现这种情况;这适用于booleanlikeisonlastsync并且通常适用于只有几个不同值的状态列。

Rows Removed by Filter: 10这很少需要过滤掉,或者因为您的表不包含大量行,或者其中大多数符合您为这两列指定的条件。您可以尝试在该表中生成更多数据或选择具有罕见状态的行。

我建议做部分索引(使用 WHERE 条件),至少对于boolean值并删除这两列以使该索引更轻量级。


推荐阅读