首页 > 解决方案 > 有一个包含活动期间的合同表,我如何计算每日有效合同数?

问题描述

我有两张桌子:

我需要计算每行的有效合同数量calendar,假设合同是有效的,如果contract.d1 <= calendar.d <= contract.d2

我现在正在寻找解决这个问题的最聪明的方法。

首先,我尝试了一种蛮力方法(以防万一):

select  d, count(*) 
from    data 
join    calendar on d between d1 and d2
group   by d;

这个查询显然非常繁重,因为它使 Oracle 使用一个临时表,其中每条记录都contract出现 N 次,其中 N 是该合同之间d1的天数。d2

另一种方法是避免在子选择中加入表和计算合约,但它也不是那么聪明:

select  d,
       (select count(*) from contract where d between d1 and d2)
from    calendar;

在这种情况下,Oracle 必须对calendar.

然后我决定将我的contract表格转换为描述单独事件的 CTE:

这个新的 CTE 允许我计算合同总量的每日和运行变化。很明显,每天的运行变化等于当天的合约总量,所以我可以立即将结果 LEFT 加入日历:

with 
    events (d,n) as (
        select d1,   +1 from contract -- new contracts
        union all
        select d2+1, -1 from contract -- terminated contracts
    ),
    counts_daily(d,n) as (
        select  d, sum(n)
        from    events   
        group   by d
    ),
    counts_running(d,n) as (
        select  d, sum(n) over (order by d)
        from    counts_daily
        order by d
    )
select  d, 
        nvl(n, lag(n ignore nulls) over(order by d)) n  
from    calendar 
left    join counts_running using(d)    
order   by d;

counts_running我只在某些合同开始或终止的日期使用左连接。生成的 select 语句中的nvl(...,lag(...))构造对于获取没有事件的日历日期的值是必要的,它从上一个有事件的日期获取值。

此查询比以前的查询要好得多,因为它在计算期间不会显着增加源数据的维度。

我还找到了进一步改进此查询并使其速度提高约 1.5..2 倍的方法。仍然牢记单独的事件而不是合同,我没有将它们放入单独的 CTE 中,而是立即执行聚合,以获取每日 +1 和 -1 事件的计数。其他一切几乎保持不变:

with 
    events_agg (d,n) as (
        select d1,    count(*) from contract  group by d1 -- new contracts
        union all
        select d2+1, -count(*) from contract group by d2 -- term. contracts
    ),
    counts_daily(d,n) as (
        select  d, sum(n)
        from    events_agg  
        group   by d
    ),
    counts_running(d,n) as (
        select  d, sum(n) over (order by d)
        from    counts_daily
        order by d
    )
select  d, 
        nvl(n, lag(n ignore nulls) over(order by d)) n  
from    calendar 
left    join counts_running using(d)    
order   by d;

所以我只是聚合了事件,让 oracle UNION 的数据集更小。

你会建议一些其他的方法吗?谢谢。

您可能希望使用以下 CTE 模拟测试数据:

with 
    --- TEST DATA EMULATION ---
    params as (
        select  trunc(sysdate) - 100  d0,                
                1e5                   number_of_contracts,
                100                   max_contract_term 
        from dual
    ),
    start_dates (d1) as(
        select  trunc(d0 + dbms_random.value(0, sysdate-d0)) d1 
                from params connect by level <= number_of_contracts
    ),
    contract (d1, d2) as ( 
        select  d1, 
                trunc(d1 + dbms_random.value(0, max_contract_term)) d2
        from    start_dates, params       
    ),
    calendar (d) as (
        select  d0 + level - 1 
        from    params
        connect by level <= sysdate - d0 + 1  
    ),
    --- END OF TEST DATA ---
    ------------------------
    ...

...甚至将其放入具有合适索引的物理calendar和表中。contract

标签: oracleoptimizationcommon-table-expression

解决方案


你的方法很好,我不认为有更好的基本解决方案。要获得所有合同交易,您必须考虑至少两次 contract表中的每条记录,您正在这样做。

一种微妙的不同可能性是,不是对表进行双重扫描,而是执行单次扫描并与包含激活和终止记录的两个记录表连接。

with two as (
select 1 n from dual union all
select -1 n from dual),
contract_transactions as (
select 
case when n = 1 then c.d1 else c.d2+1 end as d,
n
from contract c
cross join two d)
select d, sum(n) n 
from contract_transactions
group by d

而不是你的双FULL TABLE SCAN

----------------------------------------------------------------------------------------
| Id  | Operation                   | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |          |  1528K|    13M|   107K  (1)| 00:25:06 |
|   1 |  UNION-ALL                  |          |       |       |            |          |
|   2 |   HASH GROUP BY             |          |   764K|  6715K| 53785   (1)| 00:12:33 |
|   3 |    TABLE ACCESS STORAGE FULL| CONTRACT |   764K|  6715K| 53767   (1)| 00:12:33 |
|   4 |   HASH GROUP BY             |          |   764K|  6715K| 53785   (1)| 00:12:33 |
|   5 |    TABLE ACCESS STORAGE FULL| CONTRACT |   764K|  6715K| 53767   (1)| 00:12:33 |
----------------------------------------------------------------------------------------

只有一个FULL TABLE SCAN,但后面有一个连接两个记录表

-----------------------------------------------------------------------------------------
| Id  | Operation                    | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |          |  1528K|    30M|   107K  (1)| 00:25:07 |
|   1 |  HASH GROUP BY               |          |  1528K|    30M|   107K  (1)| 00:25:07 |
|   2 |   MERGE JOIN CARTESIAN       |          |  1528K|    30M|   107K  (1)| 00:25:06 |
|   3 |    VIEW                      |          |     2 |     6 |     4   (0)| 00:00:01 |
|   4 |     UNION-ALL                |          |       |       |            |          |
|   5 |      FAST DUAL               |          |     1 |       |     2   (0)| 00:00:01 |
|   6 |      FAST DUAL               |          |     1 |       |     2   (0)| 00:00:01 |
|   7 |    BUFFER SORT               |          |   764K|    13M|   107K  (1)| 00:25:07 |
|   8 |     TABLE ACCESS STORAGE FULL| CONTRACT |   764K|    13M| 53767   (1)| 00:12:33 |
-----------------------------------------------------------------------------------------

我不认为性能会有相关差异(相同的选项是 CBO;)。

对我来说,使用连接将contract记录拆分为两个事务更为原生。


推荐阅读