首页 > 解决方案 > BigQuery - 减少到字段中的唯一记录

问题描述

我有一个这样的字段表:

ID    Field 1           Field 2
1     22,34,05,44,44    01,02,02,03
2     11,01,05          02,02,01,01,22

如何在 BigQuery (strandardSQL) 中将其转换为仅显示唯一记录并从大到小排序?

这样输出将如下所示:

ID    Field 1           Field 2
1     05,22,34,44       01,02,03
2     01,05,11          01,02,22

我试过了,Split但后来我运行了数百个重复项,window功能也不允许distinct以后将这些东西组合在一起。

请帮忙弄清楚

标签: google-bigqueryuniquedistinct

解决方案


您可以拆分字符串以将它们转换为数组,然后使用重复数据删除DISTINCT和排序使用ORDER BY

SELECT
  ID,
  ARRAY(SELECT DISTINCT x FROM UNNEST(SPLIT(field1, ',')) AS x ORDER BY x) AS field1,
  ARRAY(SELECT DISTINCT x FROM UNNEST(SPLIT(field2, ',')) AS x ORDER BY x) AS field2
FROM `project-name`.dataset.table

如果您想再次将数组转换为逗号分隔的字符串,可以使用以下ARRAY_TO_STRING函数:

SELECT
  ID,
  ARRAY_TO_STRING(ARRAY(SELECT DISTINCT x FROM UNNEST(SPLIT(field1, ',')) AS x ORDER BY x), ',') AS field1,
  ARRAY_TO_STRING(ARRAY(SELECT DISTINCT x FROM UNNEST(SPLIT(field2, ',')) AS x ORDER BY x), ',') AS field2
FROM `project-name`.dataset.table

推荐阅读