首页 > 解决方案 > Planner 不使用索引顺序使用 CTE 对记录进行排序

问题描述

我试图将一些 ids 传递到排序索引上的子句中,但查询计划器在执行索引搜索后显式地对数据进行排序。以下是我的查询。

  1. 生成临时表。

    SELECT a.n/20 as n, md5(a.n::TEXT) as b INTO temp_table 
    From generate_series(1, 100000) as a(n);
    
  2. 创建索引

    CREATE INDEX idx_temp_table ON temp_table(n ASC, b ASC);
    
  3. 在下面的查询中,规划器使用索引排序并且没有明确地对数据进行排序。(预期)

    EXPLAIN ANALYSE
    SELECT * from 
    temp_table WHERE n = 10
    ORDER BY  n, b
    limit 5;
    

查询计划

QUERY PLAN Limit  (cost=0.42..16.07 rows=5 width=36) (actual time=0.098..0.101 rows=5 loops=1)   
          ->  Index Only Scan using idx_temp_table on temp_table  (cost=0.42..1565.17 rows=500 width=36) (actual time=0.095..0.098 rows=5 loops=1)
            Index Cond: (n = 10)
            Heap Fetches: 5 Planning time: 0.551 ms Execution time: 0.128 ms
  1. 但是当我使用来自 cte 的一个或多个 id 并将它们传递给子句时,规划器只使用索引来获取值,但之后显式地对它们进行排序(不是预期的)。

    EXPLAIN ANALYSE
    WITH cte(x) AS (VALUES (10))
    SELECT * from temp_table 
    WHERE n IN ( SELECT x from cte)
    ORDER BY  n, b
    limit 5;
    

    然后规划器使用下面的查询计划

查询计划

QUERY PLAN
Limit  (cost=85.18..85.20 rows=5 width=37) (actual time=0.073..0.075 rows=5 loops=1)
  CTE cte
    ->  Values Scan on "*VALUES*"  (cost=0.00..0.03 rows=2 width=4) (actual time=0.001..0.002 rows=2 loops=1)
  ->  Sort  (cost=85.16..85.26 rows=40 width=37) (actual time=0.072..0.073 rows=5 loops=1)
        Sort Key: temp_table.n, temp_table.b
        Sort Method: top-N heapsort  Memory: 25kB
        ->  Nested Loop  (cost=0.47..84.50 rows=40 width=37) (actual time=0.037..0.056 rows=40 loops=1)
              ->  Unique  (cost=0.05..0.06 rows=2 width=4) (actual time=0.009..0.010 rows=2 loops=1)
                    ->  Sort  (cost=0.05..0.06 rows=2 width=4) (actual time=0.009..0.010 rows=2 loops=1)
                          Sort Key: cte.x
                          Sort Method: quicksort  Memory: 25kB
                          ->  CTE Scan on cte  (cost=0.00..0.04 rows=2 width=4) (actual time=0.004..0.005 rows=2 loops=1)
              ->  Index Only Scan using idx_temp_table on temp_table  (cost=0.42..42.02 rows=20 width=37) (actual time=0.012..0.018 rows=20 loops=2)
                    Index Cond: (n = cte.x)
                    Heap Fetches: 40
Planning time: 0.166 ms
Execution time: 0.101 ms
  1. 我尝试在 where 子句中传递 id 时进行显式排序,以便保持 id 中的排序顺序,但仍显式排序规划器

    EXPLAIN ANALYSE
    WITH cte(x) AS (VALUES (10))
    SELECT * from temp_table 
    WHERE n IN ( SELECT x from cte)
    ORDER BY  n, b
    limit 5;
    

查询计划

QUERY PLAN
Limit  (cost=42.62..42.63 rows=5 width=37) (actual time=0.042..0.044 rows=5 loops=1)
  CTE cte
    ->  Result  (cost=0.00..0.01 rows=1 width=4) (actual time=0.000..0.000 rows=1 loops=1)
  ->  Sort  (cost=42.61..42.66 rows=20 width=37) (actual time=0.042..0.042 rows=5 loops=1)
        Sort Key: temp_table.n, temp_table.b
        Sort Method: top-N heapsort  Memory: 25kB
        ->  Nested Loop  (cost=0.46..42.28 rows=20 width=37) (actual time=0.025..0.033 rows=20 loops=1)
              ->  HashAggregate  (cost=0.05..0.06 rows=1 width=4) (actual time=0.009..0.009 rows=1 loops=1)
                    Group Key: cte.x
                    ->  Sort  (cost=0.03..0.04 rows=1 width=4) (actual time=0.006..0.006 rows=1 loops=1)
                          Sort Key: cte.x
                          Sort Method: quicksort  Memory: 25kB
                          ->  CTE Scan on cte  (cost=0.00..0.02 rows=1 width=4) (actual time=0.003..0.003 rows=1 loops=1)
              ->  Index Only Scan using idx_temp_table on temp_table  (cost=0.42..42.02 rows=20 width=37) (actual time=0.014..0.020 rows=20 loops=1)
                    Index Cond: (n = cte.x)
                    Heap Fetches: 20
Planning time: 0.167 ms
Execution time: 0.074 ms

谁能解释为什么规划器对数据使用显式排序?有没有办法绕过这个并使计划者使用索引排序顺序,以便可以保存对记录的额外排序。在生产中,我们有类似的情况,但我们选择的大小太大,但只有少数记录需要分页获取。感谢期待!

标签: postgresqlsortingquery-optimizationcommon-table-expressionpostgresql-9.6

解决方案


这实际上是计划者做出的决定,在更大的集合中values(),Postgres 将切换到更智能的计划,并在合并之前完成排序。


select version();

\echo +++++ Original

EXPLAIN ANALYSE
WITH cte(x) AS (VALUES (10))
SELECT * from temp_table
WHERE n IN ( SELECT x from cte)
ORDER BY  n, b
limit 5;

\echo +++++ TEN Values
EXPLAIN ANALYSE
WITH cte(x) AS (VALUES (10),(11),(12),(13),(14),(15),(16),(17),(18),(19)
        )
SELECT * from temp_table
WHERE n IN ( SELECT x from cte)
ORDER BY  n, b
limit 5;

\echo ++++++++ one row from table

EXPLAIN ANALYSE
WITH cte(x) AS (SELECT n FROM temp_table WHERE n = 10)
SELECT * from temp_table
WHERE n IN ( SELECT x from cte)
ORDER BY  n, b
limit 5;

\echo ++++++++ one row from table TWO ctes

EXPLAIN ANALYSE
WITH val(x) AS (VALUES (10))
,  cte(x) AS (
        SELECT n FROM temp_table WHERE n IN (select x from val)
        )
SELECT * from temp_table
WHERE n IN ( SELECT x from cte)
ORDER BY  n, b
limit 5;

结果计划:


                                                version                                                
-------------------------------------------------------------------------------------------------------
 PostgreSQL 11.3 on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 4.8.4-2ubuntu1~14.04.4) 4.8.4, 64-bit
(1 row)

+++++ Original
                                                                      QUERY PLAN                                                                      
------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=13.72..13.73 rows=5 width=37) (actual time=0.197..0.200 rows=5 loops=1)
   CTE cte
     ->  Result  (cost=0.00..0.01 rows=1 width=4) (actual time=0.001..0.001 rows=1 loops=1)
   ->  Sort  (cost=13.71..13.76 rows=20 width=37) (actual time=0.194..0.194 rows=5 loops=1)
         Sort Key: temp_table.n, temp_table.b
         Sort Method: top-N heapsort  Memory: 25kB
         ->  Nested Loop  (cost=0.44..13.37 rows=20 width=37) (actual time=0.083..0.097 rows=20 loops=1)
               ->  HashAggregate  (cost=0.02..0.03 rows=1 width=4) (actual time=0.018..0.018 rows=1 loops=1)
                     Group Key: cte.x
                     ->  CTE Scan on cte  (cost=0.00..0.02 rows=1 width=4) (actual time=0.007..0.008 rows=1 loops=1)
               ->  Index Only Scan using idx_temp_table on temp_table  (cost=0.42..13.14 rows=20 width=37) (actual time=0.058..0.068 rows=20 loops=1)
                     Index Cond: (n = cte.x)
                     Heap Fetches: 20
 Planning Time: 1.328 ms
 Execution Time: 0.360 ms
(15 rows)

+++++ TEN Values
                                                                      QUERY PLAN                                                                       
-------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.91..89.11 rows=5 width=37) (actual time=0.179..0.183 rows=5 loops=1)
   CTE cte
     ->  Values Scan on "*VALUES*"  (cost=0.00..0.12 rows=10 width=4) (actual time=0.001..0.007 rows=10 loops=1)
   ->  Merge Semi Join  (cost=0.78..3528.72 rows=200 width=37) (actual time=0.178..0.181 rows=5 loops=1)
         Merge Cond: (temp_table.n = cte.x)
         ->  Index Only Scan using idx_temp_table on temp_table  (cost=0.42..3276.30 rows=100000 width=37) (actual time=0.030..0.123 rows=204 loops=1)
               Heap Fetches: 204
         ->  Sort  (cost=0.37..0.39 rows=10 width=4) (actual time=0.023..0.023 rows=1 loops=1)
               Sort Key: cte.x
               Sort Method: quicksort  Memory: 25kB
               ->  CTE Scan on cte  (cost=0.00..0.20 rows=10 width=4) (actual time=0.003..0.013 rows=10 loops=1)
 Planning Time: 0.197 ms
 Execution Time: 0.226 ms
(13 rows)
++++++++ one row from table
                                                                       QUERY PLAN                                                                       
--------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=14.39..58.52 rows=5 width=37) (actual time=0.168..0.173 rows=5 loops=1)
   CTE cte
     ->  Index Only Scan using idx_temp_table on temp_table temp_table_1  (cost=0.42..13.14 rows=20 width=4) (actual time=0.010..0.020 rows=20 loops=1)
           Index Cond: (n = 10)
           Heap Fetches: 20
   ->  Merge Semi Join  (cost=1.25..3531.24 rows=400 width=37) (actual time=0.167..0.170 rows=5 loops=1)
         Merge Cond: (temp_table.n = cte.x)
         ->  Index Only Scan using idx_temp_table on temp_table  (cost=0.42..3276.30 rows=100000 width=37) (actual time=0.025..0.101 rows=204 loops=1)
               Heap Fetches: 204
         ->  Sort  (cost=0.83..0.88 rows=20 width=4) (actual time=0.039..0.039 rows=1 loops=1)
               Sort Key: cte.x
               Sort Method: quicksort  Memory: 25kB
               ->  CTE Scan on cte  (cost=0.00..0.40 rows=20 width=4) (actual time=0.012..0.031 rows=20 loops=1)
 Planning Time: 0.243 ms
 Execution Time: 0.211 ms
(15 rows)

++++++++ one row from table TWO ctes
                                                                          QUERY PLAN                                                                          
--------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=14.63..58.76 rows=5 width=37) (actual time=0.224..0.229 rows=5 loops=1)
   CTE val
     ->  Result  (cost=0.00..0.01 rows=1 width=4) (actual time=0.001..0.001 rows=1 loops=1)
   CTE cte
     ->  Nested Loop  (cost=0.44..13.37 rows=20 width=4) (actual time=0.038..0.052 rows=20 loops=1)
           ->  HashAggregate  (cost=0.02..0.03 rows=1 width=4) (actual time=0.007..0.007 rows=1 loops=1)
                 Group Key: val.x
                 ->  CTE Scan on val  (cost=0.00..0.02 rows=1 width=4) (actual time=0.003..0.003 rows=1 loops=1)
           ->  Index Only Scan using idx_temp_table on temp_table temp_table_1  (cost=0.42..13.14 rows=20 width=4) (actual time=0.029..0.038 rows=20 loops=1)
                 Index Cond: (n = val.x)
                 Heap Fetches: 20
   ->  Merge Semi Join  (cost=1.25..3531.24 rows=400 width=37) (actual time=0.223..0.226 rows=5 loops=1)
         Merge Cond: (temp_table.n = cte.x)
         ->  Index Only Scan using idx_temp_table on temp_table  (cost=0.42..3276.30 rows=100000 width=37) (actual time=0.038..0.114 rows=204 loops=1)
               Heap Fetches: 204
         ->  Sort  (cost=0.83..0.88 rows=20 width=4) (actual time=0.082..0.082 rows=1 loops=1)
               Sort Key: cte.x
               Sort Method: quicksort  Memory: 25kB
               ->  CTE Scan on cte  (cost=0.00..0.40 rows=20 width=4) (actual time=0.040..0.062 rows=20 loops=1)
 Planning Time: 0.362 ms
 Execution Time: 0.313 ms
(21 rows)

当心 CTE!

对于规划器来说,CTE 或多或少是黑盒子,对预期的行数、统计分布或内部排序知之甚少。

在 CTE 导致一个糟糕的计划的情况下(最初的问题不是这种情况),CTE 通常可以被一个(临时)视图替换,计划者可以看到它完全赤裸裸的荣耀。

更新

从版本 11 开始,规划器对 CTE 的处理方式有所不同:如果它们没有副作用,则它们是与主查询合并的候选对象。(但检查您的查询计划仍然是一个好主意)


推荐阅读