首页 > 解决方案 > Oracle SQL:根据子字符串和前一个(滞后)行计算连续站点访问

问题描述

使用 Oracle SQL,我正在尝试计算对网站的总唯一访问次数。我用来编写查询的表没有时间戳,其中包括分钟和秒,只是 DDMMYY,表中的每一行都代表客户在页面上的点击。该表每小时指定一个新的“会话”,无论这是否真的反映了客户 POV 的新访问。我必须做的是使用非连续会话作为唯一访问的代理。因此,如果访问之间有一个小时的休息时间,则之前的连续分组是一次访问。我将访问定义为客户 ID + 会话日 + 会话小时的唯一组合。如果客户+天组合中有连续的会话时间,我将其计为一个会话。HOUR 字段包含将日期与小时连接起来的字符串值。

Example of Raw Data:
TRANS_TO_DATE   CUSTOMER_ID HOUR
10/21/17        1007589445  October 21, 2017, Hour 1
10/21/17        1007589445  October 21, 2017, Hour 2
10/21/17        1007589445  October 21, 2017, Hour 2
10/21/17        1007589445  October 21, 2017, Hour 2
10/21/17        1007589445  October 21, 2017, Hour 3
10/21/17        1007589445  October 21, 2017, Hour 5
10/21/17        1007589445  October 21, 2017, Hour 6
10/21/17        1007589445  October 21, 2017, Hour 23
10/21/17        1007589445  October 21, 2017, Hour 23
10/21/17        1007589445  October 21, 2017, Hour 23
11/1/17         1007589445  November 1, 2017, Hour 10
1/1/18          1007589445  January  1, 2018, Hour 10
1/1/18          1007589445  January  1, 2018, Hour 10
1/1/18          1007589445  January  1, 2018, Hour 11
1/1/18          1007589445  January  1, 2018, Hour 14
1/1/18          1007589445  January  1, 2018, Hour 20
1/1/18          1007589445  January  1, 2018, Hour 22

访问次数实际上是这样的:

Customer_id Day Hour    Visit Grouping 
1007589445  October 21, 2017    1   Visit 1
1007589445  October 21, 2017    2   Visit 1
1007589445  October 21, 2017    3   Visit 1
1007589445  October 21, 2017    5   Visit 2
1007589445  October 21, 2017    6   Visit 2
1007589445  October 21, 2017    23  Visit 3
1007589445  November 1, 2017    10  Visit 1
1007589445  January 1, 2018 10  Visit 1
1007589445  January 1, 2018 11  Visit 1
1007589445  January 1, 2018 14  Visit 2
1007589445  January 1, 2018 20  Visit 3
1007589445  January 1, 2018 21  Visit 4

客户1007589445有

2017 年 10 月 21 日访问 3 次 - 2017 年 11 月 1 日访问 1 次 - 2018 年 1 月 1 日访问 4 次

总访问量:8

下面是我到目前为止需要修改以满足上述标准的 sql 代码。

select 
CUSTOMER_ID, 
TRANS_TO_DATE,
HOUR,
count (HOUR) as visits
from mstr_clickstream_vw 
where trans_to_date between start_date and end_date
and web_store_ind='US'
 group by CUSTOMER_ID, TRANS_TO_DATE,HOUR

标签: sqloraclesubstringlagclickstream

解决方案


您可以通过以下方式获得时间:

cast(trim(substr(hour, -2)) as int)

然后使用它通过使用lag()和累积条件聚合来分配会话:

select cs.*,
       sum(case when trans_to_date = prev_ttd and prev_hh = hh then 0
                when trans_to_date = prev_ttd and prev_hh = hh - 1 then 0
                when hh = 0 and prev_hh = 23 and trans_to_date = prev_ttd + interval '1' day then 0
                else 1
           end) over (partition by customer_id order by trans_to_date, hh) as grouping
from (select cs.*,
             lag(trans_to_date) over (partition by customer_id order by trans_to_date, hh) as prev_ttd,
             lag(hh) over (partition by customer_id order by trans_to_date, hh) as prev_hh
      from (select cs.*,
                   cast(trim(substr(hour, -2)) as int) as hh
            from mstr_clickstream_vw cs
           ) cs
      ) cs;

推荐阅读