sql - PostgreSQL 获取(有时)太慢
问题描述
我有一个有几百万行的表,我不断地插入新数据并获取数据。
结构是:
('orig_city', 'INTEGER NOT NULL'),
('dest_city', 'INTEGER NOT NULL'),
('dep_date', 'DATE NOT NULL'),
('orig_city_code', 'TEXT NOT NULL'),
('dest_city_code', 'TEXT NOT NULL'),
('trip_id', 'TEXT'),
('agent', 'TEXT'),
('source', 'TEXT'),
('price', 'INTEGER'),
('deep_link', 'TEXT'),
('ts', 'TIMESTAMP(0) WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP'),
('PRIMARY KEY', '(trip_id, agent, source)')
与idx_trips_201809
='orig_city, dest_city, dep_date'
它通常工作得很好,但我意识到有时有些SELECT
查询太慢了。
我看到有几次甚至超过 40 秒。不幸的是,我无法为EXPLAIN
命令复制那些。
所有命令都是通过python执行的。这是一个例子。
询问:
sql_query = """SELECT * FROM trips_201809 WHERE (orig_city = %s OR dest_city = %s) AND dep_date = %s"""
vals = (91, 279, 2018-09-21)
cur.execute(sql_query, vals)
fetched = cur.fetchall()
其中 2018-09-21 是一个datetime.date
对象。
EXPLAIN ANALYZE
显示:
('Bitmap Heap Scan on trips_201809 (cost=45576.77..50231.46 rows=2430 width=769) (actual time=1055.970..5993.580 rows=2597 loops=1)',)
(" Recheck Cond: (((orig_city = 91) AND (dep_date = '2018-09-21'::date)) OR ((dest_city = 279) AND (dep_date = '2018-09-21'::date)))",)
(' Heap Blocks: exact=1199',)
(' -> BitmapOr (cost=45576.77..45576.77 rows=2438 width=0) (actual time=1038.635..1038.635 rows=0 loops=1)',)
(' -> Bitmap Index Scan on idx_trips_201809 (cost=0.00..593.63 rows=1824 width=0) (actual time=22.119..22.119 rows=1605 loops=1)',)
(" Index Cond: ((orig_city = 91) AND (dep_date = '2018-09-21'::date))",)
(' -> Bitmap Index Scan on idx_trips_201809 (cost=0.00..44982.90 rows=614 width=0) (actual time=1016.514..1016.514 rows=1004 loops=1)',)
(" Index Cond: ((dest_city = 279) AND (dep_date = '2018-09-21'::date))",)
('Planning time: 0.927 ms',)
('Execution time: 5994.242 ms',)
-
('Bitmap Heap Scan on trips_201809 (cost=45576.77..50231.46 rows=2430 width=769) (actual time=2923.961..2984.264 rows=2597 loops=1)',)
(" Recheck Cond: (((orig_city = 91) AND (dep_date = '2018-09-21'::date)) OR ((dest_city = 279) AND (dep_date = '2018-09-21'::date)))",)
(' Heap Blocks: exact=1199',)
(' -> BitmapOr (cost=45576.77..45576.77 rows=2438 width=0) (actual time=2923.785..2923.785 rows=0 loops=1)',)
(' -> Bitmap Index Scan on idx_trips_201809 (cost=0.00..593.63 rows=1824 width=0) (actual time=1.818..1.818 rows=1605 loops=1)',)
(" Index Cond: ((orig_city = 91) AND (dep_date = '2018-09-21'::date))",)
(' -> Bitmap Index Scan on idx_trips_201809 (cost=0.00..44982.90 rows=614 width=0) (actual time=2921.966..2921.966 rows=1004 loops=1)',)
(" Index Cond: ((dest_city = 279) AND (dep_date = '2018-09-21'::date))",)
('Planning time: 0.731 ms',)
('Execution time: 2984.482 ms',)
-
('Bitmap Heap Scan on trips_201809 (cost=45576.77..50231.46 rows=2430 width=769) (actual time=231.375..233.616 rows=2598 loops=1)',)
(" Recheck Cond: (((orig_city = 91) AND (dep_date = '2018-09-21'::date)) OR ((dest_city = 279) AND (dep_date = '2018-09-21'::date)))",)
(' Heap Blocks: exact=1200',)
(' -> BitmapOr (cost=45576.77..45576.77 rows=2438 width=0) (actual time=231.222..231.222 rows=0 loops=1)',)
(' -> Bitmap Index Scan on idx_trips_201809 (cost=0.00..593.63 rows=1824 width=0) (actual time=1.781..1.781 rows=1605 loops=1)',)
(" Index Cond: ((orig_city = 91) AND (dep_date = '2018-09-21'::date))",)
(' -> Bitmap Index Scan on idx_trips_201809 (cost=0.00..44982.90 rows=614 width=0) (actual time=229.440..229.440 rows=1005 loops=1)',)
(" Index Cond: ((dest_city = 279) AND (dep_date = '2018-09-21'::date))",)
('Planning time: 0.665 ms',)
('Execution time: 233.778 ms',)
我可以看到问题在于执行,而不是计划。但我无法找出究竟是什么让它变慢了。
为什么执行时间如此多变?
我怎么能避免有时需要这么长时间?
评论中要求的其他信息。
典型查询的选择性:
dep_date
: 3%orig_city
: 0.1 - 1%dest_city
: 0.1 - 1%
计划中提到的索引是我在问题开始时解释的索引。我对其进行了编辑以使其更清晰
在此表中(目前是关键表),将为特定日期选择大约 3% 的行。但是,我还有其他类似的表格,其中 % 也在 0.1 - 1% 之间。
编辑
添加索引dep_date, orig_city, dest_city
提高了获取速度。但是,它仍然没有我预期的那么快。任何额外的建议都非常受欢迎。
('Bitmap Heap Scan on trips_201809 (cost=1326.19..13315.79 rows=6432 width=772) (actual time=553.464..1273.092 rows=6577 loops=1)',)
(" Recheck Cond: (((dep_date = '2018-09-21'::date) AND (orig_city = 353)) OR ((dep_date = '2018-09-21'::date) AND (dest_city = 106)))",)
(' Heap Blocks: exact=2010',)
(' -> BitmapOr (actual time=552.436..552.436 rows=0 loops=1)',)
(' -> Bitmap Index Scan on idx_trp_201809 (cost=0.00..27.56 rows=2739 width=0) (actual time=0.300..0.300 rows=2136 loops=1)',)
(" Index Cond: ((dep_date = '2018-09-21'::date) AND (orig_city = 353))",)
(' -> Bitmap Index Scan on idx_trp_201809 (cost=0.00..1297.99 rows=3771 width=0) (actual time=552.134..552.134 rows=4926 loops=1)',)
(" Index Cond: ((dep_date = '2018-09-21'::date) AND (dest_city = 106))",)
('Planning time: 100.031 ms',)
('Execution time: 1273.483 ms',)
解决方案
推荐阅读
- c++ - C++:求最大值、最小值、总和和平均值
- java - 名称的结果与概率不匹配
- c# - 如何使用 ASP.NET Core 获取当前路由名称?
- python - 从控制台部署 gcloud 功能不会在 zip 文件中找到代码
- matlab - 如何将绘制的向量保存到矩阵中?
- algorithm - 哪种寻路算法可以获得从 A 到 B 和 B 到 A 的相同路径?
- kafka-consumer-api - 即使在启用了exact_once之后,kafka流也会获得重复的记录
- google-chrome-extension - Bootstrap JS 的替代品在 shadow dom 中不起作用
- qt - QML 上下文菜单大小根据其内容调整
- python - 将数字渲染到 n 个可变位置