首页 > 解决方案 > 基于几何和最近的时间戳 BigQuery SQl 连接表

问题描述

我有两张桌子;df1 包含 Date1(时间戳)和 PolygonWKT(几何),df2 包含 Date2(时间戳)和 PointWKT(几何)。我根据几何加入了df1和df2,所以每个PointWKT都属于对应的PolygonWKT。问题是,Date1 和 Date2e 列搞砸了,我还需要匹配 Date1 和 Date2。我想加入基于几何的表以及Date1 和 Date2 之间最接近的时间戳匹配。

 df2
 | PointWKT  |      Date2            |     
 --------------------------------------
 |    b    | 2020-05-05 12:00:00 UTC |
 |    b    | 2020-05-05 12:00:10 UTC |
 |    b    | 2020-05-05 12:00:20 UTC |
 |    b    | 2020-05-05 12:17:00 UTC |
 |    c    | 2020-05-06 18:00:00 UTC |


df1
 | PolygonWKT  |      Date1          |
--------------------------------------
 |    A    | 2020-05-03 9:00:00 UTC  |
 |    A    | 2020-05-03 9:30:10 UTC  |
 |    B    | 2020-05-05 12:05:00 UTC |
 |    B    | 2020-05-05 12:25:00 UTC |
 |    C    | 2020-05-06 18:05:00 UTC |

代码的第一部分是正确的,但第二部分没有返回我想要的:

    SELECT *
    FROM `xxx.yyy.df1` as df1 ,
    `xxx.yyy.df2` as df2
    WHERE ST_Contains (df1.PolygonWKT, df2.PointWKT) 


    AND ( 
    df2.Date2 BETWEEN df1.Date1 AND TIMESTAMP_ADD(df1.Date1, INTERVAL 10 MINUTE)



    
desired df

| PointWKT  |      Date2            || PolygonWKT  |      Date1          |     
----------------------------------------------------------------------------
 |    b    | 2020-05-05 12:00:00 UTC | |    B    | 2020-05-05 12:05:00 UTC |
 |    b    | 2020-05-05 12:00:10 UTC | |    B    | 2020-05-05 12:05:00 UTC |
 |    b    | 2020-05-05 12:00:20 UTC | |    B    | 2020-05-05 12:05:00 UTC |
 |    b    | 2020-05-05 12:17:00 UTC | |    B    | 2020-05-05 12:25:00 UTC |
 |    c    | 2020-05-06 18:00:00 UTC | |    C    | 2020-05-06 18:05:00 UTC |

这样做的正确方法是什么?

标签: sqlgoogle-bigquery

解决方案


我想加入基于几何的表以及Date1 和 Date2 之间最接近的时间戳匹配

以下是 BigQuery 标准 SQL

SELECT  
  ARRAY_AGG(STRUCT(df2.PointWKT, df2.Date2, df1.PolygonWKT, df1.Date1) 
    ORDER BY ABS(TIMESTAMP_DIFF(df2.Date2, df1.Date1, SECOND)) 
    LIMIT 1)[OFFSET(0)].*
FROM `xxx.yyy.df1` AS df1 ,
  `xxx.yyy.df2` AS df2
WHERE ST_CONTAINS(df1.PolygonWKT, df2.PointWKT) 
GROUP BY TO_JSON_STRING(STRUCT(df2.PointWKT, df2.Date2))

如果适用于与您的示例中类似的示例数据 -

WITH `xxx.yyy.df1` AS (
  SELECT ST_GEOGPOINT(1,2) PolygonWKT, TIMESTAMP '2020-05-03 9:00:00 UTC' Date1 UNION ALL
  SELECT ST_GEOGPOINT(1,2), '2020-05-03 9:30:10 UTC' UNION ALL
  SELECT ST_GEOGPOINT(1,3), '2020-05-05 12:05:00 UTC' UNION ALL
  SELECT ST_GEOGPOINT(1,3), '2020-05-05 12:25:00 UTC' UNION ALL
  SELECT ST_GEOGPOINT(1,4), '2020-05-06 18:05:00 UTC' 
), `xxx.yyy.df2` AS (
  SELECT ST_GEOGPOINT(1,3) PointWKT, TIMESTAMP '2020-05-05 12:00:00 UTC' Date2 UNION ALL
  SELECT ST_GEOGPOINT(1,3), '2020-05-05 12:00:10 UTC' UNION ALL
  SELECT ST_GEOGPOINT(1,3), '2020-05-05 12:00:20 UTC' UNION ALL
  SELECT ST_GEOGPOINT(1,3), '2020-05-05 12:17:00 UTC' UNION ALL /* this value adjusted based on exapected result sample - as it looks as a typo */
  SELECT ST_GEOGPOINT(1,4), '2020-05-06 18:00:00 UTC' 
)

输出是

Row PointWKT    Date2                   PolygonWKT  Date1    
1   POINT(1 3)  2020-05-05 12:00:00 UTC POINT(1 3)  2020-05-05 12:05:00 UTC  
2   POINT(1 3)  2020-05-05 12:00:10 UTC POINT(1 3)  2020-05-05 12:05:00 UTC  
3   POINT(1 3)  2020-05-05 12:00:20 UTC POINT(1 3)  2020-05-05 12:05:00 UTC  
4   POINT(1 3)  2020-05-05 12:17:00 UTC POINT(1 3)  2020-05-05 12:25:00 UTC  
5   POINT(1 4)  2020-05-06 18:00:00 UTC POINT(1 4)  2020-05-06 18:05:00 UTC 

推荐阅读