sql - 如何通过对聚合函数结果进行排序来加速 PostgreSQL 查询?
问题描述
我有这个查询:
select agg.app_id, sum(downloads) as downloads, sum(revenue) as revenue from (
SELECT distinct app_id FROM sdk_modules_apps
WHERE sdk_module_id = 27
) agg inner join revenue_hist_sdk ON revenue_hist_sdk.app_id = agg.app_id
group by agg.app_id
order by app_id asc, revenue desc
limit 30
执行此查询大约需要 1.5 分钟。解释(分析,缓冲区)是这样的:
Limit (cost=8498544.92..8498544.99 rows=30 width=68) (actual time=32005.449..32136.719 rows=30 loops=1)
" Buffers: shared hit=6491 read=801721 written=312, temp read=1468232 written=1470609"
-> Sort (cost=8498544.92..8499032.40 rows=194994 width=68) (actual time=32005.448..32005.449 rows=30 loops=1)
" Sort Key: sdk_modules_apps.app_id, (sum(revenue_hist_sdk.revenue)) DESC"
Sort Method: top-N heapsort Memory: 27kB
" Buffers: shared hit=2409 read=266228, temp read=494935 written=495735"
-> Finalize GroupAggregate (cost=7574427.43..8492785.88 rows=194994 width=68) (actual time=20995.772..31969.342 rows=160329 loops=1)
Group Key: sdk_modules_apps.app_id
" Buffers: shared hit=2409 read=266228, temp read=494935 written=495735"
-> Gather Merge (cost=7574427.43..8484986.12 rows=389988 width=68) (actual time=20995.663..31752.532 rows=371741 loops=1)
Workers Planned: 2
Workers Launched: 2
" Buffers: shared hit=6491 read=801721 written=312, temp read=1468232 written=1470609"
-> Partial GroupAggregate (cost=7573427.40..8438971.80 rows=194994 width=68) (actual time=20532.363..30555.301 rows=123914 loops=3)
Group Key: sdk_modules_apps.app_id
" Buffers: shared hit=6491 read=801721 written=312, temp read=1468232 written=1470609"
-> Merge Join (cost=7573427.40..8149761.54 rows=38171380 width=20) (actual time=20532.301..28720.598 rows=16933962 loops=3)
Merge Cond: (revenue_hist_sdk.app_id = sdk_modules_apps.app_id)
" Buffers: shared hit=6491 read=801721 written=312, temp read=1468232 written=1470609"
-> Sort (cost=7499645.14..7595073.59 rows=38171380 width=24) (actual time=20428.849..24388.498 rows=30537105 loops=3)
Sort Key: revenue_hist_sdk.app_id
Sort Method: external merge Disk: 1025856kB
Worker 0: Sort Method: external merge Disk: 1073648kB
Worker 1: Sort Method: external merge Disk: 948424kB
" Buffers: shared hit=328 read=800986, temp read=1466846 written=1469211"
-> Parallel Seq Scan on revenue_hist_sdk (cost=0.00..1183013.80 rows=38171380 width=24) (actual time=0.030..4294.558 rows=30537105 loops=3)
Buffers: shared hit=314 read=800986
-> Unique (cost=73782.26..75108.27 rows=194994 width=4) (actual time=103.447..181.850 rows=267078 loops=3)
" Buffers: shared hit=6163 read=735 written=312, temp read=1386 written=1398"
-> Sort (cost=73782.26..74445.27 rows=265203 width=4) (actual time=103.446..137.035 rows=267078 loops=3)
Sort Key: sdk_modules_apps.app_id
Sort Method: external merge Disk: 3696kB
Worker 0: Sort Method: external merge Disk: 3696kB
Worker 1: Sort Method: external merge Disk: 3696kB
" Buffers: shared hit=6163 read=735 written=312, temp read=1386 written=1398"
-> Bitmap Heap Scan on sdk_modules_apps (cost=3147.76..47560.79 rows=265203 width=4) (actual time=16.269..53.601 rows=267525 loops=3)
Recheck Cond: (sdk_module_id = 27)
Heap Blocks: exact=1560
Buffers: shared hit=6149 read=735 written=312
-> Bitmap Index Scan on sdk_modules_apps_sdk_module_id_index (cost=0.00..3081.45 rows=265203 width=0) (actual time=16.084..16.084 rows=267525 loops=3)
Index Cond: (sdk_module_id = 27)
Buffers: shared hit=1469 read=735 written=312
Planning Time: 0.131 ms
Execution Time: 32388.741 ms
表收入_hist_sdk 包含 90000000 条记录,我需要通过加入 sdk_modules_apps 来按此表中的下载量或收入进行排序。来自 sdk_modules_apps 的记录数约为 250000。我真的不知道如何让它更快。我尝试创建具有不同排序的不同索引,但有时它变得更糟。
表模式:
sdk_modules_apps: sdk_module_id, app_id, installed, uninstalled
revenue_hist_sdk: app_id, utc_date, downloads, revenue
在大表中排序时如何处理查询执行时间长的问题?
解决方案
使用以下方式编写查询exists
:
SELECT app_id, sum(downloads) as downloads, sum(revenue) as revenue
FROM revenue_hist_sdk rhs
WHERE EXISTS (SELECT 1
FROM sdk_modules_apps a
WHERE a.app_id = rhs.app_id AND a.sdk_module_id = 27
)
GROUP BY app_id
ORDER BY downloads desc
LIMIT 30;
然后我会建议索引: sdk_modules_apps(app_id, sdk_module_id)
和revenue_hist_sdk(app_id)
. 第二个索引也可以具有downloads
和revenue
作为附加键。
推荐阅读
- sql - 选择具有最近日期的唯一值,按另一个值分组
- python - CUDA_ERROR_OUT_OF_MEMORY:内存不足:对于 tensorflow 2.1
- mysql - 需要使用一些文本从 sql 中获取数据
- mysql - 我正在尝试使用 vb 将数据插入数据库
- php - 尝试在 PHP 中使用 foreach 显示粘贴的体验
- selenium - Selenium IDE:无法单击 Vaadin 按钮
- angular - Ionic 5中列表第一项的双底线
- c++ - c ++如何创建包含大小未初始化字节的std :: string?
- android - 材质组件中没有 ImageView/ImageButton (Android)
- mysql - 自定义字段的数据库推荐