首页 > 解决方案 > 在 SQL Server 中添加具有一组规则的新列

问题描述

我有一组有点复杂的规则,我需要针对一张表运行这些规则。问题如下:我有一个存储医疗记录的表,我需要确定一个人在出院日期后去的第一个站点。出院日期是位置为“初始”的 end_date(这将是每个组的第一行)。该表按 ID 分组,并按如下所示的格式排序。

有 3 条规则: (1) 在 ID 组中,如果任何行的 begin_date 与第一行 end_date 匹配,则返回该位置作为第一个站点(如果有两行满足此条件,则任何一个都是正确的,首选第一种情况)。(2)如果第一个选项不存在,那么如果有患者有位置'Health'的实例,则返回'Health'。(3) 否则,如果条件 1 和条件 2 不存在,则返回 'Home'

桌子

ID    color  begin_date    end_date     location
1     red    2017-01-01    2017-01-07   initial
1     green  2017-01-05    2017-01-07   nursing
1     blue   2017-01-07    2017-01-15   rehab
1     red    2017-01-11    2017-01-22   Health
2     red    2017-02-22    2017-02-26   initial
2     green  2017-02-26    2017-02-28   nursing
2     blue   2017-02-26    2017-02-28   rehab
3     red    2017-03-11    2017-03-22   initial
4     red    2017-04-01    2017-04-07   initial
4     green  2017-04-05    2017-04-07   nursing
4     blue   2017-04-10    2017-04-15   Health

最后结果:

ID    first_site
1     rehab
2     nursing
3     home
4     Health

在 sql-server 2008 中,我的尝试:(旁注:我考虑添加一个帮助列,这将是每行的“初始”位置的 end_date,因此更容易在一行内进行比较。不确定这是否有必要)。我很感激任何指导!

SELECT
ID,
OVER( PARTITION ID CASE WHEN end_date[0] = begin_date THEN location
                        WHEN location = 'Health' THEN 'Health'
                        ELSE 'Home' end) AS [first_site]
FROM table

在 python 中,我能够得到这个答案:

def conditions(x):
    #compare each group first
    val = x.loc[x['begin_date'] == x['end_date'].iloc[0], 'location']
    #if at least one match (not return empty `Series` get first value)
    if not val.empty:
        return val.iloc[0]
    #if value is empty, check if value 'Health' exists within the group 
    elif (x['location']  == 'Health').any():
        return 'Health'
    else:
        return 'Home'

final = df.groupby('ID').apply(conditions).reset_index(name='first_site')

标签: sqlsql-serverfunctionpartitioning

解决方案


declare @example table (
    ExampleID   int identity(1,1) not null primary key clustered
,   ID          int not null
,   Color       nvarchar(255) not null
,   BeginDate   date not null
,   EndDate     date not null
,   Loc         nvarchar(255) not null
);

insert into @example (ID, color, begindate, enddate, loc)

select 1, 'red'   , '2017-01-01', '2017-01-07', 'initial' union all
select 1, 'green'  , '2017-01-05', '2017-01-07', 'nursing' union all
select 1, 'blue'  , '2017-01-07', '2017-01-15', 'rehab' union all
select 1, 'red'  , '2017-01-11', '2017-01-22', 'Health' union all
select 2, 'red'  , '2017-02-22', '2017-02-26', 'initial' union all
select 2, 'green'  , '2017-02-26', '2017-02-28', 'nursing' union all
select 2, 'blue'  , '2017-02-26', '2017-02-28', 'rehab' union all
select 3, 'red'  , '2017-03-11', '2017-03-22', 'initial' union all
select 4, 'red'  , '2017-04-01', '2017-04-07', 'initial' union all
select 4, 'green'  , '2017-04-05', '2017-04-07', 'nursing' union all
select 4, 'blue'  , '2017-04-10', '2017-04-15', 'Health';

    with cte as (
        select a.ID
             , a.Color
             , a.BeginDate
             , a.EndDate
             , b.Loc 
             , rank() over(partition by a.ID order by a.ID, a.begindate, b.enddate desc, b.loc) Ranking
          from @example a
     left join @example b
            on a.EndDate = b.BeginDate
                )
    , cte2 as (
        select id
             , Loc
          from @example
         where loc = 'health'
                )

    select a.ID
         , COALESCE(a.loc, b.loc, 'Home') as Loc 
      from cte a
 left join cte2 b
        on a.id = b.id
      where Ranking = 1

输出:

ID  Loc
1   rehab
2   nursing
3   home
4   Health

推荐阅读