postgresql - PostgreSQL:提高计算不同时的性能
问题描述
我目前正在努力提高我们数据库的性能。我需要你的帮助。我有一个像这样的表和它的索引
CREATE TABLE public.ar
(
id integer NOT NULL DEFAULT nextval('id_seq'::regclass),
user_id integer NOT NULL,
duration double precision,
is_idle boolean NOT NULL,
activity_id integer NOT NULL,
device_id integer NOT NULL,
calendar_id integer,
on_taskness integer,
week_id integer,
some_other_column_below,
CONSTRAINT id_ PRIMARY KEY (id),
CONSTRAINT a_unique_key UNIQUE (user_id, device_id, start_time_local, start_time_utc, end_time_local, end_time_utc)
)
CREATE INDEX ar_idx
ON public.ar USING btree
(week_id, calendar_id, user_id, activity_id, duration, on_taskness, is_idle)
TABLESPACE pg_default;
然后我试图运行这样的查询
EXPLAIN ANALYZE
SELECT COUNT(*)
FROM (
SELECT ar.user_id
FROM ar
WHERE ar.user_id = ANY(array[some_data]) -- data size is 352
AND ROUND(ar.duration) >0 AND ar.is_idle = false
AND ar.week_id = ANY(ARRAY[some_data]) -- data size is 37
AND ar.calendar_id = ANY(array[some_data]) -- data size is 16716
GROUP by ar.user_id
) tmp;
下面是解释结果
Aggregate (cost=31389954.72..31389954.73 rows=1 width=8) (actual time=252020.695..252020.695 rows=1 loops=1)
-> Group (cost=31389032.69..31389922.37 rows=2588 width=4) (actual time=251089.270..252020.659 rows=351 loops=1)
Group Key: ar.user_id
-> Sort (cost=31389032.69..31389477.53 rows=177935 width=4) (actual time=251089.268..251776.202 rows=6993358 loops=1)
Sort Key: ar.user_id
Sort Method: external merge Disk: 95672kB
-> Bitmap Heap Scan on ar (cost=609015.18..31371079.88 rows=177935 width=4) (actual time=1670.413..248939.440 rows=6993358 loops=1)
Recheck Cond: ((week_id = ANY ('{some_data}'::integer[])) AND (user_id = ANY ('{some_data}'::integer[])))
Rows Removed by Index Recheck: 2081028
Filter: ((NOT is_idle) AND (round(duration) > '0'::double precision) AND (calendar_id = ANY ('{some_data}'::integer[])))
Rows Removed by Filter: 534017
Heap Blocks: exact=29551 lossy=313127
-> BitmapAnd (cost=609015.18..609015.18 rows=1357521 width=0) (actual time=1666.334..1666.334 rows=0 loops=1)
-> Bitmap Index Scan on test_index_only_scan_idx (cost=0.00..272396.77 rows=6970353 width=0) (actual time=614.366..614.366 rows=7269830 loops=1)
Index Cond: ((week_id = ANY ('{some_data}'::integer[])) AND (is_idle = false))
-> Bitmap Index Scan on unique_key (cost=0.00..336529.20 rows=9948573 width=0) (actual time=1041.999..1041.999 rows=14959355 loops=1)
Index Cond: (user_id = ANY ('{some_data}'::integer[]))
Planning time: 25.563 ms
Execution time: 252029.237 ms
我也使用了 distinct ,结果是一样的。
所以我的问题如下。
ar_idx
contains ,但是在user_id
搜索行时,为什么它使用unique_key
而不是我创建的索引?- 我认为 group by 不会进行排序(这就是我没有选择的
distinct
原因),但是为什么在解释分析中会发生排序? - 运行时间很长(超过4分钟)。我怎样才能让它更快?索引错了吗?或者我能做的任何其他事情。
请注意,该ar
表包含 51585203 行。
任何帮助将不胜感激。谢谢。
- - - - - - - - - - - - - -更新 - - - - - - - - - - - ---- 在我创建了这个索引之后,现在一切都变得非常快了。我不明白为什么,谁能给我解释一下?
CREATE INDEX ar_1_idx
ON public.ar USING btree
(calendar_id, user_id)
TABLESPACE pg_default;
我将旧索引更改为
CREATE INDEX ar_idx
ON public.ar USING btree
(week_id, calendar, user_id, activity_id, duration, on_taskness, start_time_local, end_time_local) WHERE is_idle IS FALSE
TABLESPACE pg_default;
-----更新分析结果----------
Aggregate (cost=31216435.97..31216435.98 rows=1 width=8) (actual time=13206.941..13206.941 rows=1 loops=1)
Buffers: shared hit=25940518 read=430315, temp read=31079 written=31079
-> Group (cost=31215436.80..31216403.88 rows=2567 width=4) (actual time=12239.336..13206.894 rows=351 loops=1)
Group Key: ar.user_id
Buffers: shared hit=25940518 read=430315, temp read=31079 written=31079
-> Sort (cost=31215436.80..31215920.34 rows=193417 width=4) (actual time=12239.334..12932.801 rows=6993358 loops=1)
Sort Key: ar.user_id
Sort Method: external merge Disk: 95664kB
Buffers: shared hit=25940518 read=430315, temp read=31079 written=31079
-> Index Scan using ar_1_idx on activity_report ar (cost=0.56..31195807.48 rows=193417 width=4) (actual time=0.275..10387.051 rows=6993358 loops=1)
Index Cond: ((calendar_id = ANY ('{some_data}'::integer[])) AND (user_id = ANY ('{some_data}'::integer[])))
Filter: ((NOT is_idle) AND (round(duration) > '0'::double precision) AND (week_id = ANY ('{some_data}'::integer[])))
Rows Removed by Filter: 590705
Buffers: shared hit=25940518 read=430315
Planning time: 25.577 ms
Execution time: 13217.611 ms
解决方案
推荐阅读
- spring - Spring MVC 控制器 - 模型属性 - 可能的竞争条件?
- android - 无法运行显示无法安装应用程序的应用程序
- reactjs - 在 moxios 拒绝时不调用 catch
- python - Tensorflow:是否可以打印会话数?
- php - 是否有解决方案来比较 SQL 全文搜索中的特定字段?
- java - 使用 httpurlconnection 调用第三方 api
- mongodb - 来自 mongodb 模块的 Metricbeat 异常
- c# - 使用 SqlQuerySpec 对 Cosmos DB 运行两个查询的有效方法是什么?
- ftp - 如何在 jftp 脚本中建立活动连接?
- python - PyTorch:RuntimeError:函数 MulBackward0 在索引 0 处返回无效梯度 - 预期类型为 torch.cuda.FloatTensor 但得到了 torch.FloatTensor