首页 > 解决方案 > PostgreSQL 获取(有时)太慢

问题描述

我有一个有几百万行的表,我不断地插入新数据并获取数据。

结构是:

                ('orig_city', 'INTEGER NOT NULL'),
                ('dest_city', 'INTEGER NOT NULL'),
                ('dep_date', 'DATE NOT NULL'),
                ('orig_city_code', 'TEXT NOT NULL'),
                ('dest_city_code', 'TEXT NOT NULL'),
                ('trip_id', 'TEXT'),
                ('agent', 'TEXT'),
                ('source', 'TEXT'),
                ('price', 'INTEGER'),
                ('deep_link', 'TEXT'),
                ('ts', 'TIMESTAMP(0) WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP'),
                ('PRIMARY KEY', '(trip_id, agent, source)')

idx_trips_201809='orig_city, dest_city, dep_date'

它通常工作得很好,但我意识到有时有些SELECT查询太慢了。

我看到有几次甚至超过 40 秒。不幸的是,我无法为EXPLAIN命令复制那些。

所有命令都是通过python执行的。这是一个例子。

询问:

sql_query = """SELECT * FROM trips_201809 WHERE (orig_city = %s OR dest_city = %s) AND dep_date = %s"""
vals = (91, 279, 2018-09-21)

cur.execute(sql_query, vals)
fetched = cur.fetchall()

其中 2018-09-21 是一个datetime.date对象。

EXPLAIN ANALYZE显示:

('Bitmap Heap Scan on trips_201809  (cost=45576.77..50231.46 rows=2430 width=769) (actual time=1055.970..5993.580 rows=2597 loops=1)',)
("  Recheck Cond: (((orig_city = 91) AND (dep_date = '2018-09-21'::date)) OR ((dest_city = 279) AND (dep_date = '2018-09-21'::date)))",)
('  Heap Blocks: exact=1199',)
('  ->  BitmapOr  (cost=45576.77..45576.77 rows=2438 width=0) (actual time=1038.635..1038.635 rows=0 loops=1)',)
('        ->  Bitmap Index Scan on idx_trips_201809  (cost=0.00..593.63 rows=1824 width=0) (actual time=22.119..22.119 rows=1605 loops=1)',)
("              Index Cond: ((orig_city = 91) AND (dep_date = '2018-09-21'::date))",)
('        ->  Bitmap Index Scan on idx_trips_201809  (cost=0.00..44982.90 rows=614 width=0) (actual time=1016.514..1016.514 rows=1004 loops=1)',)
("              Index Cond: ((dest_city = 279) AND (dep_date = '2018-09-21'::date))",)
('Planning time: 0.927 ms',)
('Execution time: 5994.242 ms',)

-

('Bitmap Heap Scan on trips_201809  (cost=45576.77..50231.46 rows=2430 width=769) (actual time=2923.961..2984.264 rows=2597 loops=1)',)
("  Recheck Cond: (((orig_city = 91) AND (dep_date = '2018-09-21'::date)) OR ((dest_city = 279) AND (dep_date = '2018-09-21'::date)))",)
('  Heap Blocks: exact=1199',)
('  ->  BitmapOr  (cost=45576.77..45576.77 rows=2438 width=0) (actual time=2923.785..2923.785 rows=0 loops=1)',)
('        ->  Bitmap Index Scan on idx_trips_201809  (cost=0.00..593.63 rows=1824 width=0) (actual time=1.818..1.818 rows=1605 loops=1)',)
("              Index Cond: ((orig_city = 91) AND (dep_date = '2018-09-21'::date))",)
('        ->  Bitmap Index Scan on idx_trips_201809  (cost=0.00..44982.90 rows=614 width=0) (actual time=2921.966..2921.966 rows=1004 loops=1)',)
("              Index Cond: ((dest_city = 279) AND (dep_date = '2018-09-21'::date))",)
('Planning time: 0.731 ms',)
('Execution time: 2984.482 ms',)

-

('Bitmap Heap Scan on trips_201809  (cost=45576.77..50231.46 rows=2430 width=769) (actual time=231.375..233.616 rows=2598 loops=1)',)
("  Recheck Cond: (((orig_city = 91) AND (dep_date = '2018-09-21'::date)) OR ((dest_city = 279) AND (dep_date = '2018-09-21'::date)))",)
('  Heap Blocks: exact=1200',)
('  ->  BitmapOr  (cost=45576.77..45576.77 rows=2438 width=0) (actual time=231.222..231.222 rows=0 loops=1)',)
('        ->  Bitmap Index Scan on idx_trips_201809  (cost=0.00..593.63 rows=1824 width=0) (actual time=1.781..1.781 rows=1605 loops=1)',)
("              Index Cond: ((orig_city = 91) AND (dep_date = '2018-09-21'::date))",)
('        ->  Bitmap Index Scan on idx_trips_201809  (cost=0.00..44982.90 rows=614 width=0) (actual time=229.440..229.440 rows=1005 loops=1)',)
("              Index Cond: ((dest_city = 279) AND (dep_date = '2018-09-21'::date))",)
('Planning time: 0.665 ms',)
('Execution time: 233.778 ms',)

我可以看到问题在于执行,而不是计划。但我无法找出究竟是什么让它变慢了。

为什么执行时间如此多变?

我怎么能避免有时需要这么长时间?


评论中要求的其他信息。

典型查询的选择性:

计划中提到的索引是我在问题开始时解释的索引。我对其进行了编辑以使其更清晰

在此表中(目前是关键表),将为特定日期选择大约 3% 的行。但是,我还有其他类似的表格,其中 % 也在 0.1 - 1% 之间。


编辑

添加索引dep_date, orig_city, dest_city提高了获取速度。但是,它仍然没有我预期的那么快。任何额外的建议都非常受欢迎。

('Bitmap Heap Scan on trips_201809  (cost=1326.19..13315.79 rows=6432 width=772) (actual time=553.464..1273.092 rows=6577 loops=1)',)
("  Recheck Cond: (((dep_date = '2018-09-21'::date) AND (orig_city = 353)) OR ((dep_date = '2018-09-21'::date) AND (dest_city = 106)))",)
('  Heap Blocks: exact=2010',)
('  ->  BitmapOr  (actual time=552.436..552.436 rows=0 loops=1)',)
('        ->  Bitmap Index Scan on idx_trp_201809  (cost=0.00..27.56 rows=2739 width=0) (actual time=0.300..0.300 rows=2136 loops=1)',)
("              Index Cond: ((dep_date = '2018-09-21'::date) AND (orig_city = 353))",)
('        ->  Bitmap Index Scan on idx_trp_201809  (cost=0.00..1297.99 rows=3771 width=0) (actual time=552.134..552.134 rows=4926 loops=1)',)
("              Index Cond: ((dep_date = '2018-09-21'::date) AND (dest_city = 106))",)
('Planning time: 100.031 ms',)
('Execution time: 1273.483 ms',)

标签: sqlpostgresql

解决方案


推荐阅读