postgresql - 使用多个内部和外部联接加速查询
问题描述
我在使用缓慢的 Postgresql 查询时遇到问题。我已经完成了标准的 postgresql.conf 更改并验证了引用的列是否已编入索引。除此之外,我不确定下一步会是什么。下面的查询只需不到 3 分钟即可运行。任何帮助表示赞赏。
select distinct
exp.assay_id as ASSAY_KEY,
rest.result_type_id as RESULT_TYPE_ID,
rest.name as RESULT_TYPE,
rest.unit as REST_UNIT,
dtrest.name as REST_DATA_TYPE,
cont.condition_type_id as COND_TYPE_ID,
cont.name as COND_TYPE,
cont.unit as COND_UNIT,
dtcont.name as COND_DATA_TYPE,
expcon.unit as EXP_COND_UNIT
from
public.experiment exp
inner join public.experiment_result expr on expr.experiment_id = exp.experiment_id
inner join public.result_type rest on rest.result_type_id = expr.result_type_id
left outer join public.experiment_condition expcon on expcon.experiment_id = expr.experiment_id
left outer join public.condition_type cont on cont.condition_type_id = expcon.condition_type_id
left outer join public.data_type dtcont on dtcont.data_type_id = cont.data_type_id
left outer join public.data_type dtrest on dtrest.data_type_ID = rest.data_type_ID
where
exp.assay_id in (255)
解释分析结果:
Unique (cost=51405438.73..52671302.26 rows=50634541 width=1109) (actual time=123349.423..164779.863 rows=3 loops=1)
-> Sort (cost=51405438.73..51532025.09 rows=50634541 width=1109) (actual time=123349.421..157973.215 rows=29521242 loops=1)
Sort Key: rest.result_type_id, rest.name, rest.unit, dtrest.name, cont.condition_type_id, cont.name, cont.unit, dtcont.name, expcon.unit
Sort Method: external merge Disk: 3081440kB
-> Hash Left Join (cost=56379.88..1743073.05 rows=50634541 width=1109) (actual time=1307.931..26398.626 rows=29521242 loops=1)
Hash Cond: (rest.data_type_id = dtrest.data_type_id)
-> Hash Left Join (cost=56378.68..1547566.26 rows=50634541 width=799) (actual time=1307.894..21181.787 rows=29521242 loops=1)
Hash Cond: (expr.experiment_id = expcon.experiment_id)
-> Hash Join (cost=5096.61..572059.62 rows=15984826 width=47) (actual time=1002.697..11046.550 rows=9840414 loops=1)
Hash Cond: (expr.result_type_id = rest.result_type_id)
-> Hash Join (cost=5091.86..528637.07 rows=15984826 width=24) (actual time=44.062..7969.272 rows=9840414 loops=1)
Hash Cond: (expr.experiment_id = exp.experiment_id)
-> Seq Scan on experiment_result expr (cost=0.00..462557.70 rows=23232570 width=16) (actual time=0.080..4357.646 rows=23232570 loops=1)
-> Hash (cost=3986.11..3986.11 rows=88460 width=16) (actual time=43.743..43.744 rows=88135 loops=1)
Buckets: 131072 Batches: 1 Memory Usage: 5156kB
-> Seq Scan on experiment exp (cost=0.00..3986.11 rows=88460 width=16) (actual time=0.016..24.426 rows=88135 loops=1)
Filter: (assay_id = 255)
Rows Removed by Filter: 40434
-> Hash (cost=3.22..3.22 rows=122 width=31) (actual time=958.617..958.618 rows=128 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 17kB
-> Seq Scan on result_type rest (cost=0.00..3.22 rows=122 width=31) (actual time=958.542..958.575 rows=128 loops=1)
-> Hash (cost=9509.53..9509.53 rows=382603 width=768) (actual time=294.654..294.658 rows=382553 loops=1)
Buckets: 16384 Batches: 32 Memory Usage: 1077kB
-> Hash Left Join (cost=2.67..9509.53 rows=382603 width=768) (actual time=0.074..176.040 rows=382553 loops=1)
Hash Cond: (cont.data_type_id = dtcont.data_type_id)
-> Hash Left Join (cost=1.47..8301.31 rows=382603 width=458) (actual time=0.048..117.994 rows=382553 loops=1)
Hash Cond: (expcon.condition_type_id = cont.condition_type_id)
-> Seq Scan on experiment_condition expcon (cost=0.00..7102.03 rows=382603 width=74) (actual time=0.016..48.704 rows=382553 loops=1)
-> Hash (cost=1.21..1.21 rows=21 width=392) (actual time=0.021..0.022 rows=24 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 10kB
-> Seq Scan on condition_type cont (cost=0.00..1.21 rows=21 width=392) (actual time=0.012..0.014 rows=24 loops=1)
-> Hash (cost=1.09..1.09 rows=9 width=326) (actual time=0.015..0.016 rows=9 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 9kB
-> Seq Scan on data_type dtcont (cost=0.00..1.09 rows=9 width=326) (actual time=0.008..0.010 rows=9 loops=1)
-> Hash (cost=1.09..1.09 rows=9 width=326) (actual time=0.018..0.019 rows=9 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 9kB
-> Seq Scan on data_type dtrest (cost=0.00..1.09 rows=9 width=326) (actual time=0.012..0.014 rows=9 loops=1)
Planning Time: 5.997 ms
JIT:
Functions: 55
Options: Inlining true, Optimization true, Expressions true, Deforming true
Timing: Generation 19.084 ms, Inlining 20.283 ms, Optimization 604.666 ms, Emission 332.835 ms, Total 976.868 ms
Execution Time: 165268.155 ms
解决方案
查询必须从连接中遍历 3000 万行,因为您的条件exp.assay_id in (255)
不是很严格。
碰巧这些结果行中的大多数都是相同的,因此在DISTINCT
.
所以没有办法让这个查询快如闪电——它必须查看 3000 万行才能确定只有三个不同的行。
但是大部分执行时间(165 秒中的 132 秒)用于排序,因此应该可以使查询更快。
一些尝试的想法:
work_mem
尽可能多地增加,这会使排序更快。
PostgreSQL 选择显式排序是因为它不知道有这么多相同的行。否则它会选择一个更快的哈希聚合。也许我们可以利用这一点:
尝试
SET enable_sort = off;
查询以查看这是否使 PostgreSQL 选择散列聚合。升级到 PostgreSQL v13,它在哈希聚合方面变得更加智能,并且更愿意使用它们。
推荐阅读
- java - What the reason of such output "println" order? How to initialize object in this code?
- kubernetes - 使用 DEX (LDAP) 进行身份验证时组为空
- git - git checkout -B 没有重置或替代方案?
- python - How to use one bot for different websites
- jquery - Trying to get the first 5 characters of input 1, and first character of input 2
- git - 如何修复:git push 失败:文件“app.config”和推送的文件“App.config”不同,以防万一
- python - How to loop through dictionary to get both frequency of words and symbols?
- python - 将图像存储在对象中
- python - 如何在 Flask Marshmallow 模式中加载查询字符串参数
- java - 如何从一个文本文件中读取整数,将它们从最小到最大排序,然后将它们写入另一个文本文件