sql - SQL连接来自同一张表的时间戳差异
问题描述
我不确定如何在 BigQuery 中编写此 SQL 查询。我有一个包含名称和时间戳的事件表。假设我在表中只有两个事件:A 和 B。我想要做的是查询表以获取事件 A 的所有实例,并获取下一个最接近的 B 并创建一个具有时间差的新列。B 总是发生在 A 之后。
例如,如果我有一个看起来像这样的表:
A1 | 1:00 pm
B5 | 2:00 pm
A3 | 3:00 pm
B9 | 5:00 pm
我的结果表将是:
A1 | 1 hour
A3 | 2 hours
我提出的查询如下:
SELECT
CAST(TIMESTAMP_DIFF((SELECT MIN(sub.time)
FROM table sub
WHERE sub.time > main.time), main.time, SECOND) AS INT64) duration
FROM table main
这可以很好地获取我想要的表,但我还想在子查询中包含一个附加列。看起来像:
A1 | 1 hour | B5Column
A3 | 2 hours | B9Column
我尝试使用以下查询:
SELECT
(SELECT
sub.SubQueryColumn
FROM table sub
WHERE sub.time > main.time
ORDER BY sub.time asc
LIMIT 1) SubColumn,
CAST(TIMESTAMP_DIFF((SELECT MIN(sub.time)
FROM table sub
WHERE sub.time > main.time), main.time, SECOND) AS INT64) duration
FROM table main
但它没有用。我得到的错误是
不支持引用其他表的相关子查询,除非它们可以去相关,例如通过将它们转换为有效的 JOIN。
我能得到一些帮助吗?
解决方案
以下是 BigQuery 标准 SQL
#standardSQL
SELECT event, TIMESTAMP_DIFF(b_time, time, SECOND) duration, b_event
FROM (
SELECT event, time,
LEAD(time) OVER(PARTITION BY grp ORDER BY time) b_time,
LEAD(event) OVER(PARTITION BY grp ORDER BY time) b_event
FROM (
SELECT *,
COUNTIF(STARTS_WITH(event, 'A')) OVER(ORDER BY time) grp
FROM `project.dataset.your_table` t
)
)
WHERE STARTS_WITH(event, 'A')
-- ORDER BY time
您可以使用您问题中的虚拟数据来测试/玩它,如下所示
#standardSQL
WITH `project.dataset.your_table` AS (
SELECT 'A1' event, TIMESTAMP '2018-01-01 1:00:00' time UNION ALL
SELECT 'B5', TIMESTAMP '2018-01-01 2:00:00' UNION ALL
SELECT 'A3', TIMESTAMP '2018-01-01 3:00:00' UNION ALL
SELECT 'B9', TIMESTAMP '2018-01-01 5:00:00'
)
SELECT event, TIMESTAMP_DIFF(b_time, time, SECOND) duration, b_event
FROM (
SELECT event, time,
LEAD(time) OVER(PARTITION BY grp ORDER BY time) b_time,
LEAD(event) OVER(PARTITION BY grp ORDER BY time) b_event
FROM (
SELECT *,
COUNTIF(STARTS_WITH(event, 'A')) OVER(ORDER BY time) grp
FROM `project.dataset.your_table` t
)
)
WHERE STARTS_WITH(event, 'A')
ORDER BY time
结果为
Row event duration b_event
1 A1 3600 B5
2 A3 7200 B9
请注意:上述解决方案依赖于您问题中的陈述 -B will always happen after A
因此,如果您的顺序如下
WITH `project.dataset.your_table` AS (
SELECT 'A1' event, TIMESTAMP '2018-01-01 1:00:00' time UNION ALL
SELECT 'A2', TIMESTAMP '2018-01-01 1:30:00' UNION ALL
SELECT 'B5', TIMESTAMP '2018-01-01 2:00:00' UNION ALL
SELECT 'A3', TIMESTAMP '2018-01-01 3:00:00' UNION ALL
SELECT 'B9', TIMESTAMP '2018-01-01 5:00:00'
)
结果将是
Row event duration b_event
1 A1 null null
2 A2 1800 B5
3 A3 7200 B9
如果您需要解决这个问题 - 请尝试以下
#standardSQL
WITH `project.dataset.your_table` AS (
SELECT 'A1' event, TIMESTAMP '2018-01-01 1:00:00' time UNION ALL
SELECT 'A2', TIMESTAMP '2018-01-01 1:30:00' UNION ALL
SELECT 'B5', TIMESTAMP '2018-01-01 2:00:00' UNION ALL
SELECT 'A3', TIMESTAMP '2018-01-01 3:00:00' UNION ALL
SELECT 'B9', TIMESTAMP '2018-01-01 5:00:00'
)
SELECT event, TIMESTAMP_DIFF(b_time, time, SECOND) duration, b_event
FROM (
SELECT event, time, type, grp,
FIRST_VALUE(event) OVER(ORDER BY grp RANGE BETWEEN 1 FOLLOWING AND 1 FOLLOWING) b_event,
FIRST_VALUE(time) OVER(ORDER BY grp RANGE BETWEEN 1 FOLLOWING AND 1 FOLLOWING) b_time
FROM (
SELECT event, time, SUBSTR(event, 1, 1) type,
COUNTIF(STARTS_WITH(event, 'B')) OVER(ORDER BY time) grp
FROM `project.dataset.your_table` t
)
)
WHERE STARTS_WITH(event, 'A')
ORDER BY time
此版本将返回
Row event duration b_event
1 A1 3600 B5
2 A2 1800 B5
3 A3 7200 B9
推荐阅读
- stm32 - STM32 Flash 通过 SWD 编程,无需编程器
- html - 为什么所有浏览器都以蓝色显示 href 链接?
- powershell - 如何在 Powershell 命令中只返回一个值?
- javascript - 在 Mobile Safari 中,Sticky Header 会跳到屏幕顶部
- python - 如何在我的代码中正确使用带有树莓派相机模块的 openCV?
- amazon-web-services - AWS EC2 - yum 更新在 AutoScaling LaunchConfig UserData 中不起作用
- java - ReentrantReadWriteLock 阻止我对对象采取行动?
- php - 使用ajax将数据从视图传递到控制器时出错
- pixi.js - 在 pixi 中加载精灵的另一种方法
- c# - 访问在不同项目中但在相同解决方案中的文件