首页 > 解决方案 > 有没有办法在 Athena 的 ARRAY(ROW()) 中取消嵌套或返回所有元素?

问题描述

我有一列包含以下格式的数据:

array(row(action varchar,actor varchar,special_notes varchar,timestamp bigint))

其中数组保证有 1 个或多个元素。不保证数组的长度相同。

我们称之为“my_array_row_column”。以下是该列的一行的样子,例如:

[{action=cast_role, actor=Morgan.Freeman, special_notes=null, timestamp=1611616961958},
{action=note_create, actor=employee@example.com, special_notes=null, timestamp=1611617308492},
{action=dismissed, actor=newhire@example.com, special_notes=NA, timestamp=1611617308512}]

我试过使用CROSS JOIN UNNEST(my_array_row_column),但它最终只返回row()数组中的第一个。这是我尝试的最新查询:

SELECT unnested
FROM athena.movies
CROSS JOIN UNNEST(my_array_row_column) AS t(unnested)

令我沮丧的是,它只会返回

unnested
---------------------------------------------------------
{action=cast_role, actor=MorganFreeman, special_notes=null, timestamp=1611616961958}

当我希望它在结果中将所有三行(Morgan.Freeman、employee、newhire)作为单独的行返回时,如下所示:

unnested
---------------------------------------------------------
{action=cast_role, actor=Morgan.Freeman, special_notes=null, timestamp=1611616961958}
---------------------------------------------------------
{action=note_create, actor=employee@example.com, special_notes=null, timestamp=1611617308492}
---------------------------------------------------------
{action=dismissed, actor=newhire@example.com, special_notes=NA, timestamp=1611617308512}

关于我如何做到这一点的任何想法?

标签: sqlamazon-athena

解决方案


根据此处的文档,这是正确的模式:

WITH dataset AS (
  SELECT ARRAY[
    CAST(ROW('Bob', 38) AS ROW(name VARCHAR, age INTEGER)),
    CAST(ROW('Alice', 35) AS ROW(name VARCHAR, age INTEGER)),
    CAST(ROW('Jane', 27) AS ROW(name VARCHAR, age INTEGER))
  ] AS users
)
SELECT * FROM dataset

返回

+-----------------------------------------------------------------+
| users                                                           |
+-----------------------------------------------------------------+
| [{NAME=Bob, AGE=38}, {NAME=Alice, AGE=35}, {NAME=Jane, AGE=27}] |
+-----------------------------------------------------------------+

对于 UNNEST,这将是:

WITH dataset AS (
  SELECT ARRAY[
    CAST(ROW('Bob', 38) AS ROW(name VARCHAR, age INTEGER)),
    CAST(ROW('Alice', 35) AS ROW(name VARCHAR, age INTEGER)),
    CAST(ROW('Jane', 27) AS ROW(name VARCHAR, age INTEGER))
  ] AS users
)
SELECT unnested
FROM dataset, UNNEST(users) t(unnested)

哪个应该返回

+---------------------+
| unnested            |
+---------------------+
| {NAME=Bob, AGE=38}  |
| {NAME=Alice, AGE=35}| 
| {NAME=Jane, AGE=27} |
+---------------------+

在你的情况下

WITH dataset AS (
  SELECT my_array_row_column AS movieDetail
  from athena.movies 
)

SELECT unnested
FROM dataset, UNNEST(movieDetail) AS t(unnested)

或类似的东西。我认为您的交叉联接是不必要的,因为您没有在与 athena.movi​​es 表的键相同的粒度上引入任何字段,因此没有什么可乘的。


推荐阅读