oracle - ROW_NUMBER 的性能
问题描述
我有一个使用 ROW_NUMBER() 的查询。我有这样的事情:
ROW_NUMBER() OVER (ORDER BY publish_date DESC) rnum
查询运行得非常快。但是,如果我添加对“rnum”列的任何引用,查询就会变慢。因此,看起来只有 ROW_NUMBER() 不是问题,但是当我在实际查询中使用“rnum”时,它会爬行 30 秒。
有什么想法吗?
作为参考,这里是查询:
WITH aquire AS (
SELECT rtnum, trans_id, source, provider, publish_date, story_link, industry_name, sector_name, subject, teaser, tickers
FROM (SELECT d.trans_id, d.source, 'AquireMedia' AS provider,
d.trans_time AS publish_date, '/research/get_news.php?id=' || d.trans_id AS story_link,
i.name AS industry_name, s.sector_name, d.headline AS subject, NULL AS teaser,
NEWS.NEWS_FUNCTIONS.CONCATENATE_TICKERS(d.trans_id,'AQUIREMEDIA') AS tickers,
ROW_NUMBER() OVER (PARTITION BY d.trans_id ORDER BY d.trans_time DESC) as rtnum
FROM story_descriptions_3m d, story_tickers_3m t, uber_master_mv m, industry i, ind_sector ix, sectors s, comp_ind c
WHERE d.trans_id = t.trans_id
AND t.m_ticker = m.m_ticker
AND t.m_ticker = c.m_ticker(+)
AND c.ind_code = i.ind_code(+)
AND i.ind_code = ix.ind_code(+)
AND ix.sector_id = s.sector_id(+) AND s.sector_id = 10 )
WHERE rtnum = 1),
partner AS (
SELECT rtnum, trans_id, source, provider, publish_date, story_link, industry_name, sector_name, subject, teaser, tickers
FROM (SELECT CAST(n.story_id AS VARCHAR2(20)) trans_id, n.provider AS source, 'Partner News' AS provider,
n.story_date AS publish_date, n.link AS story_link, i.name AS industry_name, s.sector_name, n.title AS subject,
CAST(substr(n.teaser,1,4000) AS VARCHAR2(4000)) AS teaser, NEWS.NEWS_FUNCTIONS.CONCATENATE_TICKERS(n.story_id,'OTHER') AS tickers,
ROW_NUMBER() OVER (PARTITION BY n.story_id ORDER BY n.story_date DESC) as rtnum
FROM news_stories_3m n, news_stories_lookup_3m t, comp_ind c, uber_master_mv m, industry i, ind_sector ix, sectors s
WHERE t.story_id = n.story_id
AND t.ticker = m.ticker
AND m.m_ticker = c.m_ticker(+)
AND c.ind_code = i.ind_code(+)
AND i.ind_code = ix.ind_code(+)
AND ix.sector_id = s.sector_id(+) AND s.sector_id = 10 )
WHERE rtnum = 1)
SELECT trans_id, source, provider,
TO_CHAR(publish_date,'MM/DD/YYYY HH24:MI:SS') AS publish_date,
UNIX_TIMESTAMP(publish_date) AS timestamp,
story_link, industry_name, sector_name, subject, teaser, tickers
FROM (SELECT trans_id, source, provider, publish_date, story_link, industry_name, sector_name, subject, teaser, tickers,
ROW_NUMBER() OVER (ORDER BY publish_date DESC) rnum
FROM (SELECT trans_id, source, provider, publish_date, story_link, industry_name, sector_name, subject, teaser, tickers
FROM aquire WHERE rtnum <= 5
UNION ALL
SELECT trans_id, source, provider, publish_date, story_link, industry_name, sector_name, subject, teaser, tickers
FROM partner WHERE rtnum <= 5))
WHERE rnum BETWEEN 1 AND 1 * 5;
解决方案
让我们在一个简单的示例上模拟您的查询,以演示和解释您遇到可预期的结果。
样本数据
create table tab1 as
select rownum id, lpad('x',3000,'y') pad from dual connect by level <= 1000000;
现在,如果您在 IDE 中运行以下查询,您将立即看到结果集的第一页。
请注意,您定义了row_number
但不使用它。
select id, pad from (
select id, pad,
row_number() over (order by id) as rnum
from tab1
)
答案在下面的执行计划中
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1000K| 2866M| 135K (1)| 00:00:06 |
| 1 | TABLE ACCESS FULL| TAB1 | 1000K| 2866M| 135K (1)| 00:00:06 |
--------------------------------------------------------------------------
您会看到没有执行任何排序和过滤,这row_number
是简单的忽略。
这(仅获取少数初始行且不进行排序)解释了查询执行的原因。
相反,如果您限制row_number
如下
SQL> select id, pad from (
2 select id, pad,
3 row_number() over (order by id) as rnum
4 from tab1
5 ) where rnum between 1 and 5
6 ;
Elapsed: 00:00:07.80
你观察到可观的经过时间。执行计划再次提供了答案。
请参阅此处如何获取execution plan
您的查询。
-----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
-----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 5 | 7640 | | 762K (1)| 00:00:30 |
|* 1 | VIEW | | 5 | 7640 | | 762K (1)| 00:00:30 |
|* 2 | WINDOW SORT PUSHED RANK| | 1000K| 2866M| 3906M| 762K (1)| 00:00:30 |
| 3 | TABLE ACCESS FULL | TAB1 | 1000K| 2866M| | 135K (1)| 00:00:06 |
-----------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("RNUM">=1 AND "RNUM"<=5)
2 - filter(ROW_NUMBER() OVER ( ORDER BY "ID")<=5)
结果是现在您必须遍历所有记录(在您的情况下执行所有连接),这会破坏性能。
为了证明这一点,请使用选项或添加的子句运行简单的performat查询。您很可能会得到与第二个查询相同的不良结果。fetch all
order by
最后的评论
而不是ROW_NUMBER()
你可以使用row_limiting_clause
row_number
从inorder by
子句中传递排序列并使用offset
andfetch first
来限制结果。
select id, pad from (
select id, pad
from tab1
) order by id
fetch first 5 rows only;
在封面下,您将看到使用与WINDOW SORT PUSHED RANK
上述相同的执行计划。
推荐阅读
- arrays - 在没有 for 循环的情况下创建矩阵数组
- django - 如何覆盖 PasswordResetConfirmView?(django.contrib.auth.views)
- vue.js - 无法让 vue2-transitions 包工作
- groovy - 在 nifi 中如何使用 groovy 脚本调用休息服务
- angular - 角度表行包含使用反应形式动态地列的总和
- java - Java:如何用另一个drawable在ImageView中过度绘制现有的drawable?
- wordpress - 如何更改 wordpress 主题的所有按钮、图标和其他元素颜色?
- html - .router-link-active 在 vue css 中不起作用
- mysql - 数据流 Mysql 到 Bigquery:连接被拒绝
- linux - Linux 中的 IntelliJ 快捷方式约定