首页 > 解决方案 > 分区和索引

问题描述

我有一个为每个季度分区的表。表名是data。在表中有几列,但也有datedate是一个创建了索引的字段: create index on data (date); 现在我正在尝试查询表:

justpremium=> EXPLAIN analyze SELECT sum(col_1) FROM data WHERE "date" BETWEEN '2018-12-01' AND '2018-12-31';

                                                                          QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------
 Finalize Aggregate  (cost=355709.66..355709.67 rows=1 width=32) (actual time=577.072..577.072 rows=1 loops=1)
   ->  Gather  (cost=355709.44..355709.65 rows=2 width=32) (actual time=577.005..578.418 rows=3 loops=1)
         Workers Planned: 2
         Workers Launched: 2
         ->  Partial Aggregate  (cost=354709.44..354709.45 rows=1 width=32) (actual time=573.255..573.256 rows=1 loops=3)
               ->  Append  (cost=0.42..352031.07 rows=1071346 width=8) (actual time=15.286..524.604 rows=837204 loops=3)
                     ->  Parallel Index Scan using data_date_idx on data  (cost=0.42..8.44 rows=1 width=8) (actual time=0.004..0.004 rows=0 loops=3)
                           Index Cond: ((date >= '2018-12-01'::date) AND (date <= '2018-12-31'::date))
                     ->  Parallel Seq Scan on data_y2018q4  (cost=0.00..352022.64 rows=1071345 width=8) (actual time=15.282..465.859 rows=837204 loops=3)
                           Filter: ((date >= '2018-12-01'::date) AND (date <= '2018-12-31'::date))
                           Rows Removed by Filter: 1479844
 Planning time: 1.437 ms
 Execution time: 578.465 ms
(13 rows)

我们可以看到有Parallel Seq Scan on data_y2018q4。事实上,这对我来说很正常。我有四分之一的分区。我正在查询整个分区的第三部分,所以我有 seq 扫描,太好了。但是现在让我们直接查询分区表:

justpremium=> EXPLAIN analyze SELECT sum(col_1) FROM data_y2018q4 WHERE "date" BETWEEN '2018-12-01' AND '2018-12-31';
                                                                                       QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Finalize Aggregate  (cost=286475.38..286475.39 rows=1 width=32) (actual time=277.830..277.830 rows=1 loops=1)
   ->  Gather  (cost=286475.16..286475.37 rows=2 width=32) (actual time=277.760..279.194 rows=3 loops=1)
         Workers Planned: 2
         Workers Launched: 2
         ->  Partial Aggregate  (cost=285475.16..285475.17 rows=1 width=32) (actual time=275.950..275.950 rows=1 loops=3)
               ->  Parallel Index Scan using data_y2018q4_date_idx on data_y2018q4  (cost=0.43..282796.80 rows=1071345 width=8) (actual time=0.022..227.687 rows=837204 loops=3)
                     Index Cond: ((date >= '2018-12-01'::date) AND (date <= '2018-12-31'::date))
 Planning time: 0.187 ms
 Execution time: 279.233 ms
(9 rows)

现在我有Index Scan using data_y2018q4_date_idx并且整个查询的时间也快了两倍:279.233 ms578.465 ms. 对此有何解释?data查询表时如何强制规划器使用索引扫描。如何实现两倍更好的计时?

标签: postgresqlindexingdatabase-partitioningpostgresql-10

解决方案


推荐阅读