sql - 基于几何和最近的时间戳 BigQuery SQl 连接表
问题描述
我有两张桌子;df1 包含 Date1(时间戳)和 PolygonWKT(几何),df2 包含 Date2(时间戳)和 PointWKT(几何)。我根据几何加入了df1和df2,所以每个PointWKT都属于对应的PolygonWKT。问题是,Date1 和 Date2e 列搞砸了,我还需要匹配 Date1 和 Date2。我想加入基于几何的表以及Date1 和 Date2 之间最接近的时间戳匹配。
df2
| PointWKT | Date2 |
--------------------------------------
| b | 2020-05-05 12:00:00 UTC |
| b | 2020-05-05 12:00:10 UTC |
| b | 2020-05-05 12:00:20 UTC |
| b | 2020-05-05 12:17:00 UTC |
| c | 2020-05-06 18:00:00 UTC |
df1
| PolygonWKT | Date1 |
--------------------------------------
| A | 2020-05-03 9:00:00 UTC |
| A | 2020-05-03 9:30:10 UTC |
| B | 2020-05-05 12:05:00 UTC |
| B | 2020-05-05 12:25:00 UTC |
| C | 2020-05-06 18:05:00 UTC |
代码的第一部分是正确的,但第二部分没有返回我想要的:
SELECT *
FROM `xxx.yyy.df1` as df1 ,
`xxx.yyy.df2` as df2
WHERE ST_Contains (df1.PolygonWKT, df2.PointWKT)
AND (
df2.Date2 BETWEEN df1.Date1 AND TIMESTAMP_ADD(df1.Date1, INTERVAL 10 MINUTE)
desired df
| PointWKT | Date2 || PolygonWKT | Date1 |
----------------------------------------------------------------------------
| b | 2020-05-05 12:00:00 UTC | | B | 2020-05-05 12:05:00 UTC |
| b | 2020-05-05 12:00:10 UTC | | B | 2020-05-05 12:05:00 UTC |
| b | 2020-05-05 12:00:20 UTC | | B | 2020-05-05 12:05:00 UTC |
| b | 2020-05-05 12:17:00 UTC | | B | 2020-05-05 12:25:00 UTC |
| c | 2020-05-06 18:00:00 UTC | | C | 2020-05-06 18:05:00 UTC |
这样做的正确方法是什么?
解决方案
我想加入基于几何的表以及Date1 和 Date2 之间最接近的时间戳匹配。
以下是 BigQuery 标准 SQL
SELECT
ARRAY_AGG(STRUCT(df2.PointWKT, df2.Date2, df1.PolygonWKT, df1.Date1)
ORDER BY ABS(TIMESTAMP_DIFF(df2.Date2, df1.Date1, SECOND))
LIMIT 1)[OFFSET(0)].*
FROM `xxx.yyy.df1` AS df1 ,
`xxx.yyy.df2` AS df2
WHERE ST_CONTAINS(df1.PolygonWKT, df2.PointWKT)
GROUP BY TO_JSON_STRING(STRUCT(df2.PointWKT, df2.Date2))
如果适用于与您的示例中类似的示例数据 -
WITH `xxx.yyy.df1` AS (
SELECT ST_GEOGPOINT(1,2) PolygonWKT, TIMESTAMP '2020-05-03 9:00:00 UTC' Date1 UNION ALL
SELECT ST_GEOGPOINT(1,2), '2020-05-03 9:30:10 UTC' UNION ALL
SELECT ST_GEOGPOINT(1,3), '2020-05-05 12:05:00 UTC' UNION ALL
SELECT ST_GEOGPOINT(1,3), '2020-05-05 12:25:00 UTC' UNION ALL
SELECT ST_GEOGPOINT(1,4), '2020-05-06 18:05:00 UTC'
), `xxx.yyy.df2` AS (
SELECT ST_GEOGPOINT(1,3) PointWKT, TIMESTAMP '2020-05-05 12:00:00 UTC' Date2 UNION ALL
SELECT ST_GEOGPOINT(1,3), '2020-05-05 12:00:10 UTC' UNION ALL
SELECT ST_GEOGPOINT(1,3), '2020-05-05 12:00:20 UTC' UNION ALL
SELECT ST_GEOGPOINT(1,3), '2020-05-05 12:17:00 UTC' UNION ALL /* this value adjusted based on exapected result sample - as it looks as a typo */
SELECT ST_GEOGPOINT(1,4), '2020-05-06 18:00:00 UTC'
)
输出是
Row PointWKT Date2 PolygonWKT Date1
1 POINT(1 3) 2020-05-05 12:00:00 UTC POINT(1 3) 2020-05-05 12:05:00 UTC
2 POINT(1 3) 2020-05-05 12:00:10 UTC POINT(1 3) 2020-05-05 12:05:00 UTC
3 POINT(1 3) 2020-05-05 12:00:20 UTC POINT(1 3) 2020-05-05 12:05:00 UTC
4 POINT(1 3) 2020-05-05 12:17:00 UTC POINT(1 3) 2020-05-05 12:25:00 UTC
5 POINT(1 4) 2020-05-06 18:00:00 UTC POINT(1 4) 2020-05-06 18:05:00 UTC
推荐阅读
- matlab - 有人可以向我解释为什么我的 Gauss Jordan 消除代码仅适用于 3x3 矩阵而不适用于 4x4 矩阵
- javascript - 扩展数组时发现不可调用的@@iterator
- r - R错误:“找不到函数“变异””跟随管道运算符
- devextreme - Angular DevExtreme 网格行为
- python-3.x - 我正在尝试从 python 中的命令行提示构建一个字符串,但是我无法从它的列表中删除值
- python - 如何遍历熊猫数据框中的字符串并删除不需要的单词?
- python - 在 pycharm 中,当我尝试访问 googlesheet 时显示以下错误。请我的代码和错误
- discord.js - Discord.js - Discord 机器人停止响应命令
- java - 从 Java 中的字节数组(例如 parquet 文件的内容,例如 example.parquet)中提取 Parquet 模式?
- java - 带有 GridBagLayout 的 JPanel 大小不正确