sql - Oracle SQL:根据子字符串和前一个(滞后)行计算连续站点访问
问题描述
使用 Oracle SQL,我正在尝试计算对网站的总唯一访问次数。我用来编写查询的表没有时间戳,其中包括分钟和秒,只是 DDMMYY,表中的每一行都代表客户在页面上的点击。该表每小时指定一个新的“会话”,无论这是否真的反映了客户 POV 的新访问。我必须做的是使用非连续会话作为唯一访问的代理。因此,如果访问之间有一个小时的休息时间,则之前的连续分组是一次访问。我将访问定义为客户 ID + 会话日 + 会话小时的唯一组合。如果客户+天组合中有连续的会话时间,我将其计为一个会话。HOUR 字段包含将日期与小时连接起来的字符串值。
Example of Raw Data:
TRANS_TO_DATE CUSTOMER_ID HOUR
10/21/17 1007589445 October 21, 2017, Hour 1
10/21/17 1007589445 October 21, 2017, Hour 2
10/21/17 1007589445 October 21, 2017, Hour 2
10/21/17 1007589445 October 21, 2017, Hour 2
10/21/17 1007589445 October 21, 2017, Hour 3
10/21/17 1007589445 October 21, 2017, Hour 5
10/21/17 1007589445 October 21, 2017, Hour 6
10/21/17 1007589445 October 21, 2017, Hour 23
10/21/17 1007589445 October 21, 2017, Hour 23
10/21/17 1007589445 October 21, 2017, Hour 23
11/1/17 1007589445 November 1, 2017, Hour 10
1/1/18 1007589445 January 1, 2018, Hour 10
1/1/18 1007589445 January 1, 2018, Hour 10
1/1/18 1007589445 January 1, 2018, Hour 11
1/1/18 1007589445 January 1, 2018, Hour 14
1/1/18 1007589445 January 1, 2018, Hour 20
1/1/18 1007589445 January 1, 2018, Hour 22
访问次数实际上是这样的:
Customer_id Day Hour Visit Grouping
1007589445 October 21, 2017 1 Visit 1
1007589445 October 21, 2017 2 Visit 1
1007589445 October 21, 2017 3 Visit 1
1007589445 October 21, 2017 5 Visit 2
1007589445 October 21, 2017 6 Visit 2
1007589445 October 21, 2017 23 Visit 3
1007589445 November 1, 2017 10 Visit 1
1007589445 January 1, 2018 10 Visit 1
1007589445 January 1, 2018 11 Visit 1
1007589445 January 1, 2018 14 Visit 2
1007589445 January 1, 2018 20 Visit 3
1007589445 January 1, 2018 21 Visit 4
客户1007589445有
2017 年 10 月 21 日访问 3 次 - 2017 年 11 月 1 日访问 1 次 - 2018 年 1 月 1 日访问 4 次
总访问量:8
下面是我到目前为止需要修改以满足上述标准的 sql 代码。
select
CUSTOMER_ID,
TRANS_TO_DATE,
HOUR,
count (HOUR) as visits
from mstr_clickstream_vw
where trans_to_date between start_date and end_date
and web_store_ind='US'
group by CUSTOMER_ID, TRANS_TO_DATE,HOUR
解决方案
您可以通过以下方式获得时间:
cast(trim(substr(hour, -2)) as int)
然后使用它通过使用lag()
和累积条件聚合来分配会话:
select cs.*,
sum(case when trans_to_date = prev_ttd and prev_hh = hh then 0
when trans_to_date = prev_ttd and prev_hh = hh - 1 then 0
when hh = 0 and prev_hh = 23 and trans_to_date = prev_ttd + interval '1' day then 0
else 1
end) over (partition by customer_id order by trans_to_date, hh) as grouping
from (select cs.*,
lag(trans_to_date) over (partition by customer_id order by trans_to_date, hh) as prev_ttd,
lag(hh) over (partition by customer_id order by trans_to_date, hh) as prev_hh
from (select cs.*,
cast(trim(substr(hour, -2)) as int) as hh
from mstr_clickstream_vw cs
) cs
) cs;
推荐阅读
- python - 无法安装 TensorFlow 版本 1.15 或更早版本
- javascript - react如何返回标签?
- api - 如何让邮递员使用“x-auth-token”而不是“授权”承载?
- sql-server - 我的使用外部应用和左连接的 SQL 语句中的语法错误在哪里?
- r - 使用ggplot绘图时如何隐藏某些因素?
- kivymd - MDList 行和回调
- google-apps-script - 谷歌表单脚本插件onEdit(e)不起作用
- python - 从大熊猫数据帧中切片
- microsoft-graph-api - Microsoft Graph:筛选请求
- angular - 将字符串从页面传递到另一个页面是未定义的