首页 > 解决方案 > 优先合并/拆分重叠的日期范围

问题描述

我有三张桌子。一张表告诉我何时与特定供应商签约。第二个告诉我我们与所有供应商签订的基本费用表。第三个告诉我具体合同是否对其中一项费用有不同的合同费率。表格如下所示:

CREATE TABLE [dbo].[Facility](
    [FacilityID] [bigint] IDENTITY(1,1) NOT NULL,
    [ProviderID] [varchar](50) NOT NULL,
    [VendorID] [bigint] NOT NULL,
    [FacilityName] [varchar](300) NOT NULL,
    [FacilityAddress1] [varchar](300) NOT NULL,
    [FacilityAddress2] [varchar](300) NOT NULL,
    [FacilityCity] [varchar](300) NOT NULL,
    [FacilityState] [char](2) NOT NULL,
    [FacilityZip] [varchar](10) NOT NULL,
    [ContractEffectiveDate] [date] NOT NULL,
    [ContractTermDate] [date] NOT NULL,
 CONSTRAINT [PK_Facility] PRIMARY KEY CLUSTERED 
(
    [FacilityID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO


CREATE TABLE [dbo].[BaseFeeSchedule](
    [BaseFeeScheduleID] [int] IDENTITY(1,1) NOT NULL,
    [FeeCode] [varchar](10) NOT NULL,
    [Description] [varchar](800) NOT NULL,
    [Rate] [money] NOT NULL,
    [CategoryID] [int] NOT NULL,
    [RateEffectiveDate] [date] NOT NULL,
    [RateTermDate] [date] NOT NULL,
 CONSTRAINT [PK_BaseFeeSchedule] PRIMARY KEY CLUSTERED 
(
    [BaseFeeScheduleID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO

CREATE TABLE [dbo].[OverrideFeeSchedule](
    [OverrideFeeScheduleID] [bigint] IDENTITY(1,1) NOT NULL,
    [FacilityID] [bigint] NOT NULL,
    [FeeCode] [varchar](10) NOT NULL,
    [OverrideRate] [money] NOT NULL,
    [RateEffectiveDate] [date] NOT NULL,
    [RateTermDate] [date] NOT NULL,
 CONSTRAINT [PK_OverrideFeeSchedule] PRIMARY KEY CLUSTERED 
(
    [OverrideFeeScheduleID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO

ALTER TABLE [dbo].[OverrideFeeSchedule]  WITH CHECK ADD  CONSTRAINT [FK_OverrideFeeSchedule_Facility] FOREIGN KEY([FacilityID])
REFERENCES [dbo].[Facility] ([FacilityID])
GO

ALTER TABLE [dbo].[OverrideFeeSchedule] CHECK CONSTRAINT [FK_OverrideFeeSchedule_Facility]
GO

我们有一个现有系统,其中一张表如下所示:

CREATE TABLE [dbo].[FeeSchedule](
    [FeeScheduleID] [int] IDENTITY(1,1) NOT NULL,
    [VendorID] [int] NULL,
    [FeeCd] [varchar](10) NOT NULL,
    [StartDate] [date] NOT NULL,
    [EndDate] [date] NOT NULL,
    [ContractedAmount] [money] NOT NULL,
    [ProgramTypeID] [int] NULL,
 CONSTRAINT [PK_FeeSchedule] PRIMARY KEY CLUSTERED 
(
    [FeeScheduleID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO

该表在代码中用于确定支付每个供应商的正确费率。我的工作是更新该表,但事实证明这是有问题的,因为不同的设施在不同的日期签订了合同。每份合同都包括基本费用表。但是,合同允许某些费用被不同的费用“覆盖”(当有折扣时通常低于正常的合同费用,当需要添加附加费时偶尔会更高)。这三个表是我构建的存储所有当前数据的表,我一直在使用它们来构建软件所需的 FeeSchedule 表。处理更改很容易,但我的任务是验证 FeeSchedule 表中的数据是否准确。

FeeSchedule 表不仅包括新数据(这是我唯一更改的),还包括以前的数据。因此,计划是获取三个表中的数据,运行查询以合并日期范围(其中 OverrideFeeSchedule 表中的费用优先于 BaseFeeSchedule 表中的费用)

一个例子:

INSERT INTO Facility(VendorID,ContractEffectiveDate,ContractTermDate,...) 
VALUES(1,'1/1/2017','12/31/9999',...) --Assume FacilityID=1

INSERT INTO BaseFeeSchedule(FeeCode,Rate,RateEffectiveDate,RateTermDate,...) 
VALUES('1',100,'1/1/2015','10/15/2016',...),
('1',120,'10/16/2016','4/5/2018',...),
('1',140,'4/6/2018','12/31/9999',...)

INSERT INTO OverrideFeeSchedule(FacilityID,FeeCode,OverrideRate,RateEffectiveDate,RateTermDate,...) 
VALUES(1,'1',50,'3/1/2017','5/31/2018',...),
(1,'1',70,'7/1/2018','12/31/9999',...)

And from this data, I would want:

INSERT INTO FeeSchedule(VendorID, FeeCd, StartDate,EndDate,ContractedAmount)
VALUES(1,'1','1/1/2017','2/28/2017',120), --From BaseFeeSchedule
(1,'1','3/1/2017','5/31/2018',50), --From OverrideFeeSchedule
(1,'1','6/1/2018','6/30/2018',140), --From BaseFeeSchedule
(1,'1','7/1/2018','12/31/9999',70) --From OverrideFeeSchedule

我已验证 OverrideFeeSchedule 表中没有单个 Facility/FeeCode 组合的数据重叠,并且 BaseFeeSchedule 中没有单个 FeeCode 的数据重叠。我目前的解决方案需要永远。我正在执行以下操作:

建立自第一个签约设施开始以来的每一天的表格。(BigTable 只是一个包含大约一百万条记录的表,我只需要从与供应商签订合同的第一个日期到一年后的每一天。但是,由于最大递归大约是 20,000 ,当从第一个签约供应商到今天起一年的范围超过20,000天时,我可能会因为违反最大递归而出错。所以,我希望有一个不同的解决方案。

SELECT DATEADD(DAY,ROW_NUMBER() OVER (ORDER BY A.TableID) - 1,B.MinDate) CheckDate 
INTO #DatesToCheck
FROM BigTable A
CROSS JOIN 
    (SELECT MIN(ContractEffectiveDate) MinDate
    FROM Facility) B
WHERE DATEADD(DAY,ROW_NUMBER() OVER (ORDER BY A.TableID) - 1,B.MinDate) < DATEADD(YEAR,1,GETDATE())

将此表与其他表连接起来,构建一个包含每天、当天签约的每个设施、当天应收费的每个 FeeCode 以及当天的具体费率的巨大表。我不会为那个连接的代码烦恼,但写起来并不难。

接下来,我使用此处描述的技术来合并日期范围: StackOverflow

虽然这种技术有效,但速度非常慢。有没有更直接的方法来生成我正在寻找的结果集?基本上,我正在寻找如何修改该链接中的方法,以考虑与不同优先级(基本与覆盖)的潜在重叠,就像我提供的示例中一样。

标签: sql-servertsqlintervalsaggregationschedule

解决方案


我希望我得到这个正确...

首先,您应该实现一个数字/日期表。这不是绝对必要的,但在许多情况下非常漂亮。你可以按照这个例子...

有了这样的列表,您可以尝试以下内容:

DECLARE @endDate DATE='20191231';

WITH DailyBaseRate AS
(
    SELECT CoveredDays.CalendarDate
          ,CONCAT('base ',bfs.RateEffectiveDate) AS RateKey
          ,bfs.FeeCode
          ,bfs.Rate
    FROM BaseFeeSchedule bfs
    CROSS APPLY(SELECT * FROM RunningNumbers rn WHERE rn.CalendarDate<=@endDate AND rn.CalendarDate>=bfs.RateEffectiveDate AND rn.CalendarDate<=bfs.RateTermDate) CoveredDays

)
,OverrideRates AS
(
    SELECT CoveredDays.CalendarDate
          ,o.FacilityID 
          ,CONCAT('override ',o.RateEffectiveDate) AS RateKey
          ,o.FeeCode
          ,o.OverrideRate
    FROM OverrideFeeSchedule o
    CROSS APPLY(SELECT * FROM RunningNumbers rn WHERE  rn.CalendarDate<=@endDate AND rn.CalendarDate>=o.RateEffectiveDate AND rn.CalendarDate<=o.RateTermDate) CoveredDays
) 
,EffectiveRates AS
(
    SELECT f.*
          ,dbr.CalendarDate
          ,COALESCE(ovr.RateKey, dbr.RateKey) AS EffectiveRateKey
          ,COALESCE(ovr.FeeCode, dbr.FeeCode) AS EffectiveFeeCode
          ,COALESCE(ovr.OverrideRate, dbr.Rate) AS EffectiveRate
    FROM dbo.Facility f
    CROSS JOIN DailyBaseRate dbr
    LEFT JOIN OverrideRates ovr ON ovr.FacilityID=f.FacilityID AND ovr.CalendarDate=dbr.CalendarDate
    WHERE dbr.CalendarDate<=@endDate 
      AND dbr.CalendarDate>=f.ContractEffectiveDate 
      AND dbr.CalendarDate<=f.ContractTermDate
)
SELECT FacilityID,FacilityName
      ,EffectiveRateKey,EffectiveFeeCode,EffectiveRate
      ,MIN(CalendarDate) AS FromDate
      ,MAX(CalendarDate) AS ToDate
FROM EffectiveRates
GROUP BY FacilityID,FacilityName,EffectiveRateKey,EffectiveFeeCode,EffectiveRate
ORDER BY FacilityID,FromDate;

结果(我在您的测试数据中添加了第二个工具......)

+------------+--------------+---------------------+------------------+---------------+------------+------------+
| FacilityID | FacilityName | EffectiveRateKey    | EffectiveFeeCode | EffectiveRate | FromDate   | ToDate     |
+------------+--------------+---------------------+------------------+---------------+------------+------------+
| 1          | Fac1         | base 2016-10-16     | 1                | 120,00        | 2017-01-01 | 2017-02-28 |
+------------+--------------+---------------------+------------------+---------------+------------+------------+
| 1          | Fac1         | override 2017-03-01 | 1                | 50,00         | 2017-03-01 | 2018-05-31 |
+------------+--------------+---------------------+------------------+---------------+------------+------------+
| 1          | Fac1         | base 2018-04-06     | 1                | 140,00        | 2018-06-01 | 2018-06-30 |
+------------+--------------+---------------------+------------------+---------------+------------+------------+
| 1          | Fac1         | override 2018-07-01 | 1                | 50,00         | 2018-07-01 | 2019-12-31 |
+------------+--------------+---------------------+------------------+---------------+------------+------------+
| 2          | Fac2         | base 2018-04-06     | 1                | 140,00        | 2019-01-01 | 2019-12-31 |
+------------+--------------+---------------------+------------------+---------------+------------+------------+
| 2          | Fac2         | override 2019-07-01 | 1                | 99,00         | 2019-07-01 | 2019-08-15 |
+------------+--------------+---------------------+------------------+---------------+------------+------------+

简而言之,这个想法

  • 第一个 CTE 会将您的基本时间表转换为天数列表(每天一行,使用当前代码和每天的费率)
  • 第二个 CTE 将执行相同的操作,但使用覆盖计划
  • 第三个 CTE 将 CROSS JOIN 您的设施与基本时间表(如果有很多设施,这可能会变得相当大)并 LEFT JOIN 覆盖率(没有额外的行)
  • 集合被过滤到实际使用的范围
  • 最后,我们可以按一些列分组,并用 MIN 和 MAX 选择区间边界

提示:我们需要EffectiveRateKey避免将具有相同速率和代码的不同间隔组合在一起。作为副作用,您可以看到,费率是从哪个来源获取的。

提示 2:由于我们永远不知道引擎会按什么顺序工作,所以考虑一下索引,使用(索引)临时表而不是 CTE 可能会有很大帮助......


推荐阅读