首页 > 解决方案 > 在 Redshift 查询中为 max(date) 函数指定意外事件

问题描述

我试图确定客户在日历周内何时不登录,他们最后一次登录是什么时候。起始数据集如下所示:

User_Name   Login_Date  Week_Ending
Bobisaur    1/1/2019    1/5/2019
Bobisaur    1/3/2019    1/5/2019
Bobisaur    3/5/2019    3/9/2019
Bobisaur    3/24/2019   3/30/2019
Bobisaur    4/1/2019    4/6/2019

现在到周末1/12/2019等等,客户将没有登录Bobisaur我想做的是最终得到一个如下所示的数据集:

User_Name   Week_Ending (for weeks with no login)   Days Since Last Login (as of week ending date)
Bobisaur    1/12/2019    7 
Bobisaur    1/19/2019    14 
Bobisaur    1/26/2019    21 
Bobisaur    2/2/2019     28 
Bobisaur    2/9/2019     35 
Bobisaur    2/16/2019    42 
Bobisaur    2/23/2019    49 
Bobisaur    3/2/2019     56 
Bobisaur    3/16/2019    11 
Bobisaur    3/23/2019    18 
Bobisaur    4/13/2019    12 

这将过滤到他们没有登录的用户的周列表中,并有一列显示自他们上次登录以来已经过了多少天,截至该特定的周末日期。

我能够获得Week_Ending没有登录活动的日期,但是,我一直在计算“自上次登录以来的天数(截至周末日期)”。我尝试做的是使用(Week_Ending - max(Login_Date)),然后用max(Login_Date) <= Week_Ending.

Week_Ending但是,这基本上删除了值早于 max的所有行Login_Date

任何帮助将不胜感激。谢谢!

标签: sqlamazon-web-servicesamazon-redshift

解决方案


结果查询看起来很复杂,但还不错:

  • intput_raw- 无论您在问题中粘贴什么
  • inputlogin_data-与各自交换week_ending
  • cal- 带有连续星期六的日历表(您对周结束的定义)。在 Redshift 中,生成行的唯一方法是SELECT. 在这里,我通过将输入与自身交叉连接来生成 25 行,以获得 25 (5x5) 个连续的星期六。CROSS JOIN可以改成SELECT row_number() over () from arbitrary table limit 25
  • cal_usrs- 所有用户和所有星期六的桌子
  • output_raw- 计算的核心 - 首先JOIN cal_usrs使用input并使用窗口函数来获取自上次登录以来的天数。由于不可能对带有窗口函数结果的列进行过滤(并且您所需的输出没有带有 的行0),因此有一个最终的SELECT.
  • finalSELECT - 只选择我们感兴趣的。

查询:

with input_raw as (
  select 'Bobisaur' as username, '1/1/2019'::date as login_date
  union all
  select 'Bobisaur', '1/3/2019'::date
  union all
  select 'Bobisaur', '3/5/2019'::date
  union all
  select 'Bobisaur', '3/24/2019'::date
  union all
  select 'Bobisaur', '4/1/2019'::date
), input as (
  select
         username,
         -- in your example weeks ends on saturday hence Monday + 5 days
         date_trunc('week', login_date) + interval '5 days' as week_ending
  from input_raw
), cal as (
  -- this will create a table with consecutive Saturdays
  select
         date_trunc('week', '12/1/2018'::date) + interval '5 days'+ 7 * row_number() over () as week_ending
  -- can be changed to 'from arbitrary table limit 25' or whatever time window you wish
  from input_raw a cross join input_raw b --this will produce 25 rows

), cal_usrs as (
  select * from cal cross join (select distinct username from input) as u
  -- this is very important - you want to have all weeks with all users
), output_raw as (
  select cal_usrs.username,
         cal_usrs.week_ending,
         max(input.week_ending)
             over (partition by cal_usrs.username order by cal_usrs.week_ending rows between unbounded preceding and current row ) as last_login_week,
         extract('days' from cal_usrs.week_ending - last_login_week) as days_since_last_login
  from input
         right join cal_usrs using (username, week_ending)
)
select
  username,
  to_char(week_ending, 'MM/DD/YYYY') as week_anding,
  days_since_last_login
from output_raw
where days_since_last_login <> 0 -- your example did not contain 0 rows
order by week_ending

结果看起来像(我认为您在 3 月 16 日之后的示例中计算错误):

    username    week_ending days_since_last_login
    Bobisaur    01/12/2019  7
    Bobisaur    01/19/2019  14
    Bobisaur    01/26/2019  21
    Bobisaur    02/02/2019  28
    Bobisaur    02/09/2019  35
    Bobisaur    02/16/2019  42
    Bobisaur    02/23/2019  49
    Bobisaur    03/02/2019  56
    Bobisaur    03/16/2019  7
    Bobisaur    03/30/2019  7
    Bobisaur    04/13/2019  7
    Bobisaur    04/20/2019  14
    Bobisaur    04/27/2019  21
    Bobisaur    05/04/2019  28
    Bobisaur    05/11/2019  35
    Bobisaur    05/18/2019  42
    Bobisaur    05/25/2019  49

推荐阅读