sql - 为什么 Postgres EXPLAIN ANALYZE 报告与实际查询执行相比存在巨大的性能差异
问题描述
我的任务是在我有此查询的系统中重写一些低性能的 sql
select
"aggtable".id as t_id,
count(joined.packages)::integer as t_package_count,
sum(coalesce((joined.packages ->> 'weight'::text)::double precision, 0::double precision)) as t_total_weight
from
"aggtable"
join (
select
"unnested".myid, json_array_elements("jsontable".jsondata) as packages
from
(
select
distinct unnest("tounnest".arrayofid) as myid
from
"aggtable" "tounnest") "unnested"
join "jsontable" on
"jsontable".id = "unnested".myid) joined on
joined.myid = any("aggtable".arrayofid)
group by
"aggtable".id
EXPLAN ANALYZE 结果是
Sort Method: quicksort Memory: 611kB
-> Nested Loop (cost=30917.16..31333627.69 rows=27270 width=69) (actual time=4.028..2054.470 rows=3658 loops=1)
Join Filter: ((unnest(tounnest.arrayofid)) = ANY (aggtable.arrayofid))
Rows Removed by Join Filter: 9055436
-> ProjectSet (cost=30917.16..36645.61 rows=459000 width=48) (actual time=3.258..13.846 rows=3322 loops=1)
-> Hash Join (cost=30917.16..34316.18 rows=4590 width=55) (actual time=3.246..7.079 rows=1661 loops=1)
Hash Cond: ((unnest(tounnest.arrayofid)) = jsontable.id)
-> Unique (cost=30726.88..32090.38 rows=144700 width=16) (actual time=1.901..3.720 rows=1664 loops=1)
-> Sort (cost=30726.88..31408.63 rows=272700 width=16) (actual time=1.900..2.711 rows=1845 loops=1)
Sort Key: (unnest(tounnest.arrayofid))
Sort Method: quicksort Memory: 135kB
-> ProjectSet (cost=0.00..1444.22 rows=272700 width=16) (actual time=0.011..1.110 rows=1845 loops=1)
-> Seq Scan on aggtable tounnest (cost=0.00..60.27 rows=2727 width=30) (actual time=0.007..0.311 rows=2727 loops=1)
-> Hash (cost=132.90..132.90 rows=4590 width=55) (actual time=1.328..1.329 rows=4590 loops=1)
Buckets: 8192 Batches: 1 Memory Usage: 454kB
-> Seq Scan on jsontable (cost=0.00..132.90 rows=4590 width=55) (actual time=0.006..0.497 rows=4590 loops=1)
-> Materialize (cost=0.00..73.91 rows=2727 width=67) (actual time=0.000..0.189 rows=2727 loops=3322)
-> Seq Scan on aggtable (cost=0.00..60.27 rows=2727 width=67) (actual time=0.012..0.317 rows=2727 loops=1)
Planning Time: 0.160 ms
Execution Time: 2065.268 ms
我试图从头重写这个查询来分析性能并理解初衷
select
joined.joinid,
count(joined.packages)::integer as t_package_count,
sum(coalesce((joined.packages ->> 'weight'::text)::double precision, 0::double precision)) as t_total_weight
from
(
select
joinid ,
json_array_elements(jsondata) as packages
from
( (
select
distinct unnest(at2.arrayofid) as joinid, at2.id as rootid
from
aggtable at2) unnested
join jsontable jt on
jt.id = unnested.joinid)) joined
group by joined.joinid
EXPLAIN ANALYZE 返回的
HashAggregate (cost=873570.28..873572.78 rows=200 width=28) (actual time=18.379..18.741 rows=1661 loops=1)
Group Key: (unnest(at2.arrayofid))
-> ProjectSet (cost=44903.16..191820.28 rows=27270000 width=48) (actual time=3.019..14.684 rows=3658 loops=1)
-> Hash Join (cost=44903.16..53425.03 rows=272700 width=55) (actual time=3.010..4.999 rows=1829 loops=1)
Hash Cond: ((unnest(at2.arrayofid)) = jt.id)
-> Unique (cost=44712.88..46758.13 rows=272700 width=53) (actual time=1.825..2.781 rows=1845 loops=1)
-> Sort (cost=44712.88..45394.63 rows=272700 width=53) (actual time=1.824..2.135 rows=1845 loops=1)
Sort Key: (unnest(at2.arrayofid)), at2.id
Sort Method: quicksort Memory: 308kB
-> ProjectSet (cost=0.00..1444.22 rows=272700 width=53) (actual time=0.009..1.164 rows=1845 loops=1)
-> Seq Scan on aggtable at2 (cost=0.00..60.27 rows=2727 width=67) (actual time=0.005..0.311 rows=2727 loops=1)
-> Hash (cost=132.90..132.90 rows=4590 width=55) (actual time=1.169..1.169 rows=4590 loops=1)
Buckets: 8192 Batches: 1 Memory Usage: 454kB
-> Seq Scan on jsontable jt (cost=0.00..132.90 rows=4590 width=55) (actual time=0.007..0.462 rows=4590 loops=1)
Planning Time: 0.144 ms
Execution Time: 18.889 ms
根据 postgres 的评估,我发现查询性能(20 毫秒到 2000 毫秒)存在巨大差异。但是,真正的查询性能与该差异相差无几(快的大约是 500 毫秒,慢的大约是 1 秒)
我的问题
1/ EXPLAIN 在性能上产生巨大差异但在现实生活中没有这么大的差异是否正常?
2/ 第二个优化查询是否正确?第一个查询做错了什么?
我还提供了示例数据库的凭据,以便每个人都可以尝试查询
postgres://birylwwg:X6EM3Al9Jhqzz0w6EaSSx79pa4aXRBZq@arjuna.db.elephantsql.com:5432/birylwwg
密码是
X6EM3Al9Jhqzz0w6EaSSx79pa4aXRBZq
解决方案
推荐阅读
- .net - WhiteSource 不排除目录
- batch-file - 尝试从 .bat 文件执行 gcc (MinGW64) 时出现“没有这样的文件或目录”错误
- sql - 如何在 PostgreSQL 中进行数据透视?
- javascript - 将嵌套对象数组转换为对象数组的最佳方法
- javascript - 有条件地渲染到 div 高度
- python - 有效地遍历多个列表(GPS 坐标)
- octave - 八度查找据称不存在的文件
- python - 将函数返回作为参数传递给另一个函数
- pytorch - CNNLstm 模型的学习率查找器
- asp.net-core - 当有“this”关键字时,如何使用 xUnit 和 Moq 测试 web api 方法