sql - 按 1 分钟间隔分组操作链 sql BigQuery
问题描述
我需要以 1 分钟的间隔对数据进行分组以进行一系列操作。我的数据如下所示:
id MetroId Time ActionName refererurl
111 a 2020-09-01-09:19:00 First www.stackoverflow/a12345
111 b 2020-09-01-12:36:54 First www.stackoverflow/a12345
111 f 2020-09-01-12:36:56 First www.stackoverflow/xxxx
111 b 2020-09-01-12:36:58 Midpoint www.stackoverflow/a12345
111 f 2020-09-01-12:37:01 Midpoint www.stackoverflow/xxx
111 b 2020-09-01-12:37:03 Third www.stackoverflow/a12345
111 b 2020-09-01-12:37:09 Complete www.stackoverflow/a12345
222 d 2020-09-01-15:17:44 First www.stackoverflow/a2222
222 d 2020-09-01-15:17:48 Midpoint www.stackoverflow/a2222
222 d 2020-09-01-15:18:05 Third www.stackoverflow/a2222
我需要获取具有以下条件的数据:如果x_id
并且x_url
具有列的Complete
值action_name
,则获取它。如果没有Complete
则抓取Third
等。
ARRAY_AGG(current_query_result
ORDER BY CASE ActionName
WHEN 'Complete' THEN 1
WHEN 'Third' THEN 2
WHEN 'Midpoint' THEN 3
WHEN 'First' THEN 4
END
LIMIT 1
)[OFFSET(0)]
FROM
(
SELECT d.id, c.Time, c.ActionName, c.refererurl, c.MetroId
FROM
`bq_query_table_c` c
INNER JOIN `bq_table_d` d ON d.id = c.CreativeId
WHERE
c.refererurl LIKE "https://www.stackoverflow/%"
AND c.ActionName in ('First', 'Midpoint', 'Third', 'Complete')
) current_query_result
GROUP BY
id,
refererurl,
MetroId
TIMESTAMP_SUB(
PARSE_TIMESTAMP('%Y-%m-%d-%H:%M:%S', time),
INTERVAL MOD(UNIX_SECONDS(PARSE_TIMESTAMP('%Y-%m-%d-%H:%M:%S', time)), 1 * 60)
SECOND
)
期望的输出:
id MetroId Time ActionName refererurl
111 a 2020-09-01-09:19:00 First www.stackoverflow/a12345
111 f 2020-09-01-12:37:01 Midpoint www.stackoverflow/xxx
111 b 2020-09-01-12:37:09 Complete www.stackoverflow/a12345
222 c 2020-09-01-15:18:05 Third www.stackoverflow/a2222
解决方案
以下是 BigQuery 标准 SQL
#standardSQL
WITH temp AS (
SELECT *, PARSE_TIMESTAMP('%Y-%m-%d-%H:%M:%S', time) ts
FROM `project.dataset.bq_table`
)
SELECT * EXCEPT (ts, time_lag) FROM (
SELECT * ,
TIMESTAMP_DIFF(LEAD(ts) OVER(PARTITION BY id ORDER BY ts), ts, SECOND) time_lag
FROM (
SELECT
AS VALUE ARRAY_AGG(t
ORDER BY STRPOS('First,Midpoint,Third,Complete',action_name) DESC
LIMIT 1
)[OFFSET(0)]
FROM temp t
WHERE action_name IN ('First', 'Midpoint', 'Third', 'Complete')
GROUP BY id, url,
TIMESTAMP_SUB(ts, INTERVAL MOD(UNIX_SECONDS(ts), 60) SECOND
)
)
)
WHERE NOT IFNULL(time_lag, 777) < 60
您可以使用您问题中的示例数据进行测试,使用上面的示例数据,如下例所示
#standardSQL
WITH `project.dataset.bq_table` AS (
SELECT 111 id, '2020-09-01-09:19:00' time, 'First' action_name, 'www.stackoverflow/a12345' url UNION ALL
SELECT 111, '2020-09-01-12:36:54', 'First', 'www.stackoverflow/a12345' UNION ALL
SELECT 111, '2020-09-01-12:36:58', 'Midpoint', 'www.stackoverflow/a12345' UNION ALL
SELECT 111, '2020-09-01-12:37:03', 'Third', 'www.stackoverflow/a12345' UNION ALL
SELECT 111, '2020-09-01-12:37:09', 'Complete', 'www.stackoverflow/a12345' UNION ALL
SELECT 222, '2020-09-01-15:17:44', 'First', 'www.stackoverflow/a2222' UNION ALL
SELECT 222, '2020-09-01-15:17:48', 'Midpoint', 'www.stackoverflow/a2222' UNION ALL
SELECT 222, '2020-09-01-15:18:05', 'Third', 'www.stackoverflow/a2222'
), temp AS (
SELECT *, PARSE_TIMESTAMP('%Y-%m-%d-%H:%M:%S', time) ts
FROM `project.dataset.bq_table`
)
SELECT * EXCEPT (ts, time_lag) FROM (
SELECT * ,
TIMESTAMP_DIFF(LEAD(ts) OVER(PARTITION BY id ORDER BY ts), ts, SECOND) time_lag
FROM (
SELECT
AS VALUE ARRAY_AGG(t
ORDER BY STRPOS('First,Midpoint,Third,Complete',action_name) DESC
LIMIT 1
)[OFFSET(0)]
FROM temp t
WHERE action_name IN ('First', 'Midpoint', 'Third', 'Complete')
GROUP BY id, url,
TIMESTAMP_SUB(ts, INTERVAL MOD(UNIX_SECONDS(ts), 60) SECOND
)
)
)
WHERE NOT IFNULL(time_lag, 777) < 60
结果
Row id time action_name url
1 111 2020-09-01-09:19:00 First www.stackoverflow/a12345
2 111 2020-09-01-12:37:09 Complete www.stackoverflow/a12345
3 222 2020-09-01-15:18:05 Third www.stackoverflow/a2222
注意:我仍然不能 100% 确定您的用例 - 但以上是基于到目前为止讨论/评论的内容
推荐阅读
- laravel - Laravel 5 hasMany 关系返回不正确的关系
- matlab - 有没有办法在某些字符之间选择代码
- python - 逻辑工作一半但不完全
- android - 如何删除xamarin表单webview中的元素
- java - 如果我切换布局计时器执行一半代码
- angular - 找不到构建器 @angular-builders/jest:run 的实现
- android - 视图的右边不应该是左+宽吗?
- javascript - 如何在 Javascript 中为数组对象使用 forEach 方法或任何数组方法?
- ruby - Ruby:向下舍入到最近的平方数
- python - 查找两个日期之间的日期范围并重复列