首页 > 解决方案 > query gets very slow when :jsonb ?& operator is used

问题描述

I have the following SQL query that works fast

select
   phone_number.id,
   phone_number.phone_number,
   phone_number.account_id,
   phone_number.used AS used,
   (
      now() AT TIME ZONE account.timezone
   )
   ::time AS local_time 
from
   phone_number 
   INNER JOIN
      account 
      ON account.id = phone_number.account_id 
where
   phone_number.used = false 
   AND phone_number.account_id IN 
   (
      SELECT
         phone_number.account_id 
      FROM
         phone_number 
      WHERE
         insert_timestamp < (now() - interval '10 hours')
   )
   AND 
   (
      now() AT TIME ZONE account.timezone
   )
   ::time BETWEEN 
   CASE
      WHEN
         EXTRACT(DOW 
FROM
   now() AT TIME ZONE account.timezone) IN 
   (
      6,
      0
   )
THEN
   '15:30'::time 
ELSE
   '17:30'::time 
   END
AND '22:10'::time 
order by
   random() limit 1

But when I add this to it account.residence_details::jsonb ?& array['city', 'state', 'streetName'] making the full query into

select
   phone_number.id,
   phone_number.phone_number,
   phone_number.account_id,
   phone_number.used AS used,
   (
      now() AT TIME ZONE account.timezone
   )
   ::time AS local_time 
from
   phone_number 
   INNER JOIN
      account 
      ON account.id = phone_number.account_id 
where
   phone_number.used = false 
   AND phone_number.account_id IN 
   (
      SELECT
         phone_number.account_id 
      FROM
         phone_number 
      WHERE
         insert_timestamp < (now() - interval '10 hours')
   )
   AND 
   (
      now() AT TIME ZONE account.timezone
   )
   ::time BETWEEN 
   CASE
      WHEN
         EXTRACT(DOW 
FROM
   now() AT TIME ZONE account.timezone) IN 
   (
      6,
      0
   )
THEN
   '15:30'::time 
ELSE
   '17:30'::time 
   END
AND '22:10'::time 
   AND account.residence_details::jsonb ?& array['city', 'state', 'streetName'] 
order by
   random() limit 1

The query takes about 1 minute to complete

Below is EXPLAIN ANALYZE for the query WITHOUT account.residence_details::jsonb ?& array['city', 'state', 'streetName']

Limit  (cost=15795.97..15795.97 rows=1 width=45) (actual time=382.995..382.995 rows=0 loops=1)
  ->  Sort  (cost=15795.97..15796.18 rows=85 width=45) (actual time=382.993..382.993 rows=0 loops=1)
        Sort Key: (random())
        Sort Method: quicksort  Memory: 25kB
        ->  Nested Loop  (cost=8742.24..15795.54 rows=85 width=45) (actual time=382.640..382.640 rows=0 loops=1)
              Join Filter: (phone_number.account_id = account.id)
              ->  Hash Join  (cost=8741.96..15403.38 rows=850 width=37) (actual time=347.011..368.677 rows=2099 loops=1)
                    Hash Cond: (phone_number.account_id = phone_number_1.account_id)
                    ->  Seq Scan on phone_number  (cost=0.00..6649.74 rows=850 width=29) (actual time=14.499..33.591 rows=2453 loops=1)
                          Filter: (NOT used)
                          Rows Removed by Filter: 190152
                    ->  Hash  (cost=8629.44..8629.44 rows=9001 width=8) (actual time=332.368..332.369 rows=9581 loops=1)
                          Buckets: 16384  Batches: 1  Memory Usage: 503kB
                          ->  HashAggregate  (cost=8539.43..8629.44 rows=9001 width=8) (actual time=320.550..326.757 rows=9581 loops=1)
                                Group Key: phone_number_1.account_id
                                ->  Seq Scan on phone_number phone_number_1  (cost=0.00..8067.05 rows=188955 width=8) (actual time=0.010..169.126 rows=191615 loops=1)
                                      Filter: (insert_timestamp < (now() - '10:00:00'::interval))
                                      Rows Removed by Filter: 990
              ->  Index Scan using account_id_idx on account  (cost=0.29..0.45 rows=1 width=25) (actual time=0.006..0.006 rows=0 loops=2099)
                    Index Cond: (id = phone_number_1.account_id)
                    Filter: (((timezone(timezone, now()))::time without time zone <= '22:10:00'::time without time zone) AND ((timezone(timezone, now()))::time without time zone >= CASE WHEN (date_part('dow'::text, timezone(timezone, now())) = ANY ('{6,0}'::double precision[])) THEN '15:30:00'::time without time zone ELSE '17:30:00'::time without time zone END))
                    Rows Removed by Filter: 1
Planning time: 2.025 ms
Execution time: 383.794 ms

Below is EXPLAIN ANALYZE for the query with account.residence_details::jsonb ?& array['city', 'state', 'streetName']

Limit  (cost=15916.82..15916.83 rows=1 width=45) (actual time=258768.686..258768.696 rows=1 loops=1)
  ->  Sort  (cost=15916.82..15916.83 rows=1 width=45) (actual time=258768.684..258768.685 rows=1 loops=1)
        Sort Key: (random())
        Sort Method: top-N heapsort  Memory: 25kB
        ->  Nested Loop Semi Join  (cost=0.29..15916.81 rows=1 width=45) (actual time=495.076..258755.141 rows=1715 loops=1)
              Join Filter: (account.id = phone_number_1.account_id)
              Rows Removed by Join Filter: 167271743
              ->  Nested Loop  (cost=0.29..7634.96 rows=1 width=54) (actual time=65.620..229.670 rows=1737 loops=1)
                    ->  Seq Scan on phone_number  (cost=0.00..6649.74 rows=850 width=29) (actual time=59.234..98.326 rows=3772 loops=1)
                          Filter: (NOT used)
                          Rows Removed by Filter: 190333
                    ->  Index Scan using account_id_idx on account  (cost=0.29..1.16 rows=1 width=25) (actual time=0.029..0.029 rows=0 loops=3772)
                          Index Cond: (id = phone_number.account_id)
                          Filter: ((residence_details ?& '{city,state,streetName}'::text[]) AND ((timezone(timezone, now()))::time without time zone <= '22:10:00'::time without time zone) AND ((timezone(timezone, now()))::time without time zone >= CASE WHEN (date_part('dow'::text, timezone(timezone, now())) = ANY ('{6,0}'::double precision[])) THEN '15:30:00'::time without time zone ELSE '17:30:00'::time without time zone END))
                          Rows Removed by Filter: 1
              ->  Seq Scan on phone_number phone_number_1  (cost=0.00..8067.05 rows=188955 width=8) (actual time=0.004..87.357 rows=96300 loops=1737)
                    Filter: (insert_timestamp < (now() - '10:00:00'::interval))
                    Rows Removed by Filter: 21
Planning time: 1.712 ms
Execution time: 258768.781 ms

I can not figure out why it gets so slow after adding account.residence_details::jsonb ?& array['city', 'state', 'streetName']

标签: postgresqlpostgresql-performancepostgresql-10

解决方案


I'd say that the additional condition makes PostgreSQL underestimate the result count of the first join so badly that it chooses a nested loop for the second join by mistake, which is where all the time is spent.

Maybe an index on the expression will help to get better estimates:

CREATE INDEX ON account USING gin (residence_details::jsonb);
ANALYZE account;  -- to calculate statistics for the indexed expression

推荐阅读