首页 > 解决方案 > 提取其他字段也匹配的重复值

问题描述

我正在使用以下查询zip在我的数据集中查找重复值。

这确实可以显示任何重复 zip 值的国家、城市和街道,但我真的希望它只包含重复的国家、城市和街道,而不仅仅是邮政编码?

SELECT
  Country,
  City,
  Street,
  zip
FROM
  project.dataset.tablename
WHERE
  zip > 1
  AND CAST(zip AS string) IN (
  SELECT
    CAST(zip AS string)
  FROM
    project.dataset.tablename
  GROUP BY
    CAST(zip AS string)
  HAVING
    COUNT(CAST(zip AS string)) > 1 )
ORDER BY
  zip DESC

标签: sqlgoogle-bigquerybigquery-standard-sql

解决方案


我想你想要:

SELECT t.*
FROM (SELECT t.*,
             COUNT(*) OVER (PARTITION BY zip, country, city, street) as cnt
      FROM project.dataset.tablename t
     ) t 
WHERE cnt > 1
ORDER BY zip;

无论如何,对于这类问题,窗口函数通常会提供最佳解决方案。


推荐阅读