首页 > 解决方案 > 基于两列的 GCP Big Query (SQL) 中的重复行

问题描述

我正在尝试输出所有列,同时对某些行进行重复数据删除。我尝试过的一切似乎都没有接近。

SELECT * FROM `project.dataset.table`
??

标签: sqlgoogle-bigquery

解决方案


考虑使用ARRAY_AGG

with TestData as (
  select 'Tom' as Name, '1' as Phone, timestamp('2020-01-01 00:00:00') as LastUpdateDate 
  union all
  select 'Tom' as Name, '2' as Phone, timestamp('2020-01-02 00:00:00') as LastUpdateDate
  union all
  select 'Eva' as Name, '3' as Phone, timestamp('2020-01-03 00:00:00') as LastUpdateDate
  union all
  select 'Eva' as Name, '4' as Phone, timestamp('2020-01-04 00:00:00') as LastUpdateDate
)
SELECT deduplicated.* FROM (
  SELECT ARRAY_AGG(t ORDER BY t.LastUpdateDate DESC LIMIT 1)[OFFSET(0)] as deduplicated
  FROM TestData as t 
  GROUP BY Name
)

推荐阅读