postgresql - 什么决定了postgres中同一张表中每个索引的成本
问题描述
我正在使用 PostgreSQL 服务器 12 和 psql 12。我有一个非常大的表(大约 600 万个元组),它有几列。说就像
People(
bigint id,
varchar company_type,
bigint complany_id,
varchar department_type,
bigint department_id,
......
)
我有几个索引:
"people_pkey" PRIMARY KEY, btree (id),
"unique_person" UNIQUE, btree (company_type, company_id, department_type, department_id),
"company" btree (company_type, company_id),
"department" btree (department_type, department_id)
现在我有这个简单的查询
EXPLAIN ANALYZE SELECT array(
SELECT DISTINCT my_people.company_id
FROM people AS my_people
WHERE
"my_people"."company_type" = 'Some_company' AND
"my_people"."department_type" = 'Some_department' AND
"my_people"."department_id" = ANY(ARRAY[1,2,3,4,5,6,7])
) a
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Result (cost=8.60..8.61 rows=1 width=32) (actual time=2.377..2.378 rows=1 loops=1)
InitPlan 1 (returns $0)
-> Unique (cost=0.56..8.60 rows=1 width=8) (actual time=2.373..2.374 rows=0 loops=1)
-> Index Scan using company on people my_people (cost=0.56..8.60 rows=1 width=8) (actual time=2.373..2.373 rows=0 loops=1)
Index Cond: ((company_type)::text = 'Some_company'::text)
Filter: ((department_type)::text = 'Some_department'::text) AND (department_id = ANY ('{1,2,3,4,5,6,7}'::integer[])))
Rows Removed by Filter: 1189
Planning Time: 0.873 ms
Execution Time: 2.405 ms
如果它使用“部门”索引,查询时间将得到优化,但这样做,它使用“公司”。
我试过使用 pg_hint_plan 来强制它使用我想要的索引,它会比“公司”快得多。
/*+ IndexScan(my_people department) */ EXPLAIN ANALYZE SELECT array(
SELECT DISTINCT my_people.company_id
FROM people AS my_people
WHERE
"my_people"."company_type" = 'Some_company' AND
"my_people"."department_type" = 'Some_department' AND
"my_people"."department_id" = ANY(ARRAY[1,2,3,4,5,6,7])
) a
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Result (cost=503435.96..503435.97 rows=1 width=32) (actual time=0.073..0.074 rows=1 loops=1)
InitPlan 1 (returns $0)
-> Unique (cost=503435.95..503435.96 rows=1 width=8) (actual time=0.070..0.071 rows=0 loops=1)
-> Sort (cost=503435.95..503435.96 rows=1 width=8) (actual time=0.070..0.071 rows=0 loops=1)
Sort Key: my_people.id
Sort Method: quicksort Memory: 25kB
-> Index Scan using department on people my_people (cost=0.56..503435.94 rows=1 width=8) (actual time=0.066..0.067 rows=0 loops=1)
Index Cond: (((department_type)::text = 'Some_department'::text) AND (department_id = ANY ('{1,2,3,4,5,6,7}'::integer[])))
Filter: ((Company_type)::text = 'Some_company'::text)
Rows Removed by Filter: 1
Planning Time: 0.252 ms
Execution Time: 0.096 ms
(12 rows)
/*+ IndexScan(my_people unique_person) */ EXPLAIN ANALYZE SELECT array(
SELECT DISTINCT my_people.company_id
FROM people AS my_people
WHERE
"my_people"."company_type" = 'Some_company' AND
"my_people"."department_type" = 'Some_department' AND
"my_people"."department_id" = ANY(ARRAY[1,2,3,4,5,6,7])
) a
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Result (cost=8.60..8.61 rows=1 width=32) (actual time=1.821..1.822 rows=1 loops=1)
InitPlan 1 (returns $0)
-> Unique (cost=0.56..8.60 rows=1 width=8) (actual time=1.818..1.818 rows=0 loops=1)
-> Index Scan using unique_person on people my_people (cost=0.56..8.60 rows=1 width=8) (actual time=1.817..1.817 rows=0 loops=1)
Index Cond: (((company_type)::text = 'Some_company'::text) AND ((department_type)::text = 'Some_department'::text))
Filter: (department_id = ANY ('{1,2,3,4,5,6,7}'::integer[]))
Rows Removed by Filter: 994
Planning Time: 0.258 ms
Execution Time: 1.842 ms
(9 rows)
然后我想也许是它ScalarArrayOpExpr
使它效率低下。所以我把查询改成了这个。这明显更快,但我仍然需要提示 postgres 使用“部门”索引。
/*+ IndexScan(my_people department) */ EXPLAIN ANALYZE SELECT array(
SELECT a.*
FROM unnest(ARRAY[1,2,3,4,5,6,7]) as t(fid)
, LATERAL (
SELECT DISTINCT my_people.id
FROM people AS my_people
WHERE
"my_people"."company_type" = 'Some_company' AND
"my_people"."department_type" = 'Some_department' AND
"my_people"."target_id" = t.fid
) a
) b
;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Result (cost=3658.94..3658.95 rows=1 width=32) (actual time=0.092..0.094 rows=1 loops=1)
InitPlan 1 (returns $1)
-> Nested Loop (cost=522.67..3658.94 rows=7 width=8) (actual time=0.090..0.091 rows=0 loops=1)
-> Function Scan on unnest t (cost=0.00..0.07 rows=7 width=4) (actual time=0.005..0.006 rows=7 loops=1)
-> Unique (cost=522.67..522.68 rows=1 width=8) (actual time=0.011..0.012 rows=0 loops=7)
-> Sort (cost=522.67..522.67 rows=1 width=8) (actual time=0.011..0.011 rows=0 loops=7)
Sort Key: my_people.id
Sort Method: quicksort Memory: 25kB
-> Index Scan using department on people my_people (cost=0.56..522.66 rows=1 width=8) (actual time=0.010..0.010 rows=0 loops=7)
Index Cond: (((department_type)::text = 'Some_department'::text) AND (department_id = t.fid))
Filter: ((company_type)::text = 'Some_company'::text)
Rows Removed by Filter: 0
Planning Time: 0.248 ms
Execution Time: 0.120 ms
(14 rows)
当使用“company”时,索引扫描的成本最低,所以我认为这就是psql会使用这个索引的原因。但是,这种行为会大大减慢我的查询速度,我想知道 psql 如何确定要使用的索引。
更新1:
我试过VACUUM (VERBOSE, ANALYZE) people;
了,这是输出。似乎什么都没有改变,但现在我的查询使用了我需要的索引。
VACUUM (VERBOSE, ANALYZE) people;
INFO: vacuuming "people"
INFO: index "people_pkey" now contains 66865768 row versions in 183343 pages
DETAIL: 0 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU: user: 0.11 s, system: 0.41 s, elapsed: 3.24 s.
INFO: index "unique_person" now contains 66865768 row versions in 867318 pages
DETAIL: 0 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU: user: 0.54 s, system: 1.97 s, elapsed: 6.13 s.
INFO: index "department" now contains 66865768 row versions in 308674 pages
DETAIL: 0 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU: user: 0.21 s, system: 0.67 s, elapsed: 1.32 s.
INFO: "people": found 0 removable, 66865768 nonremovable row versions in 1943422 out of 1943422 pages
DETAIL: 0 dead row versions cannot be removed yet, oldest xmin: 85284
There were 0 unused item identifiers.
Skipped 0 pages due to buffer pins, 0 frozen pages.
0 pages are entirely empty.
CPU: user: 6.00 s, system: 13.11 s, elapsed: 30.15 s.
INFO: vacuuming "pg_toast.pg_toast_1418456"
INFO: index "pg_toast_1418456_index" now contains 2344 row versions in 9 pages
DETAIL: 0 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
INFO: "pg_toast_1418456": found 0 removable, 2344 nonremovable row versions in 534 out of 534 pages
DETAIL: 0 dead row versions cannot be removed yet, oldest xmin: 85284
There were 0 unused item identifiers.
Skipped 0 pages due to buffer pins, 0 frozen pages.
0 pages are entirely empty.
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
INFO: analyzing "people"
INFO: "people": scanned 30000 of 1943422 pages, containing 1032582 live rows and 0 dead rows; 30000 rows in sample, 66891419 estimated total rows
VACUUM
解决方案
您的标题非常笼统,因此我假设您正在寻找一般情况下解决此类问题的方法,而不是针对您的特定情况的固定答案。
计划者首选计划的 EXPLAIN ANALYZE 具有:
Rows Removed by Filter: 1189
但是,EXPLAIN 输出并没有告诉您它希望使用过滤器删除多少行,而只是告诉您它实际删除了多少行。(这种缺乏信息可以说是解释分析中的一个缺陷)。基于较低的总成本估算,它似乎没想到会移除那么多。你可以运行一个变体查询来查看它期望索引返回多少行(除了 1 之外它默默地期望过滤器删除):
explain select * from person where company_type = 'Some_company'
对于你真正希望它使用的索引,计划也有点莫名其妙:
-> Index Scan using department on people my_people (cost=0.56..503435.94 rows=1 width=8) (actual time=0.066..0.067 rows=0 loops=1)
Index Cond: (((department_type)::text = 'Some_department'::text) AND (department_id = ANY ('{1,2,3,4,5,6,7}'::integer[])))
Filter: ((Company_type)::text = 'Some_company'::text)
Rows Removed by Filter: 1
对于大量“由过滤器删除”而言,这里的成本似乎太小了,而对于过滤的行数少,它现在似乎太高了。但同样,我们只能看到它实际删除了多少行,而不是它预期的多少。要查看在过滤之前希望此索引扫描找到多少,您可以运行:
explain select * from person where department_type = 'Some_department' AND department_id = ANY ('{1,2,3,4,5,6,7}'::integer[])
推荐阅读
- html - 重新设计一个
- -元素
- reporting-services - 如何在批处理模式下调用输出菜单项并执行报表?
- c++ - QMouseEvent 用于 QWidget 上的单次移动
- python - 如何赋予价值并自动将其添加(增加)到其他变量,同时减少列表python中的另一个?
- jquery - 如果引导模式框单击无按钮,如何丢弃复选框值并显示从数据库获取的值
- database - MongoDB是否有可能立即在磁盘上写入而不将信息存储在内存中?
- sql - ts_vector 按月分组
- hybris - SAP HYBRIS [Y]:如何限制 customersupportagentrole 用户取消或退回订单
- java - 如何通过注释用 try-catch 包装方法?
- php - 如何转换 Swift 日期和 TimeInterval 以便我可以发送到 php 服务器?