首页 > 解决方案 > BigQuery 中的更新和内部联接语句问题

问题描述

我正在尝试运行更新查询来标记超过一定距离的异常值。我的table1看起来像这样:

   OBJECTID    ID DateAndTime           Lat         Long          Outlier
     1           1  2002-11-26T12:00:00  38.82551095 -109.9709871  NULL
     2           1  2002-11-29T13:00:00  38.541137   -109.677575   NULL
     3           2  2002-11-03T10:00:00  38.550676   -109.901774   NULL
     4           2  2002-11-04T10:00:00  38.53689    -109.683531   NULL
     5           2  2002-11-05T10:00:00  38.45689    -109.683531   NULL

从下面的查询中,我只希望objectid_next用“P”填充异常值字段。例如,在上面的 table1 2 中,字段OBJECTID中会有一个 P。outlier

UPDATE `table1` 
    SET Outlier = 'P' 
    FROM(
    select objectid_next, distance from (
    select *, 
      points.OBJECTID as objectid_start, 
      points.ID as ID,
      lead(points.objectid) over next as objectid_next,
      round(st_distance(st_geogpoint(Longitude, Latitude), lead(st_geogpoint(Longitude, Latitude)) over next), 2) as distance
    FROM  `table1` AS points INNER JOIN 
    `table2` AS table2 ON table2.ID = points.ID
    WHERE points.DateAndTime BETWEEN table2.StartDate AND COALESCE (table2.EndDate, CURRENT_DATE())
    AND points.ID = '1'
    AND points.DateAndTime BETWEEN '2020-06-01T00:00:00' AND '2020-12-01T00:00:00'
    window next as (partition by points.ID order by DateAndTime)
    )) 
    WHERE distance > 1000; 

我不断收到此错误:UPDATE/MERGE must match at most one source row for each target row。我猜这是因为 table1 中有多个具有相同 ID 的行。我很难解决这个问题。我尝试仅选择 OBJECTID 字段,但仍然出现错误。有关如何解决此更新/匹配错误的任何想法?

表 2仅包含一些关于 ID 号的额外信息,以帮助缩小搜索范围,如下所示:

 OBJECTID       ID StartDate            Sex
     1           1  2002-11-20T00:00:00  M
     2           2  2002-11-20T00:00:00  M
     3           3  2002-11-01T00:00:00  M

标签: google-bigquery

解决方案


您需要指定table1andFROM子句之间的连接谓词:

UPDATE `table1` t1
SET Outlier = 'P' 
FROM (
  select objectid_next, distance 
  from (
    select *, 
      points.OBJECTID as objectid_start, 
      points.ID as ID,
      lead(points.objectid) over next as objectid_next,
      round(st_distance(st_geogpoint(Longitude, Latitude), lead(st_geogpoint(Longitude, Latitude)) over next), 2) as distance
    FROM  `table1` AS points INNER JOIN `table2` AS table2 ON table2.ID = points.ID
    WHERE points.DateAndTime BETWEEN table2.StartDate AND COALESCE (table2.EndDate, CURRENT_DATE())
      AND points.ID = '1'
      AND points.DateAndTime BETWEEN '2020-06-01T00:00:00' AND '2020-12-01T00:00:00'
    window next as (partition by points.ID order by DateAndTime)
  )
  where distance > 1000
) t2
WHERE t1.objectid = t2.objectid_next;

更多细节在这里


推荐阅读