sql - 有没有办法在 Athena 的 ARRAY(ROW()) 中取消嵌套或返回所有元素?
问题描述
我有一列包含以下格式的数据:
array(row(action varchar,actor varchar,special_notes varchar,timestamp bigint))
其中数组保证有 1 个或多个元素。不保证数组的长度相同。
我们称之为“my_array_row_column”。以下是该列的一行的样子,例如:
[{action=cast_role, actor=Morgan.Freeman, special_notes=null, timestamp=1611616961958},
{action=note_create, actor=employee@example.com, special_notes=null, timestamp=1611617308492},
{action=dismissed, actor=newhire@example.com, special_notes=NA, timestamp=1611617308512}]
我试过使用CROSS JOIN UNNEST(my_array_row_column)
,但它最终只返回row()
数组中的第一个。这是我尝试的最新查询:
SELECT unnested
FROM athena.movies
CROSS JOIN UNNEST(my_array_row_column) AS t(unnested)
令我沮丧的是,它只会返回
unnested
---------------------------------------------------------
{action=cast_role, actor=MorganFreeman, special_notes=null, timestamp=1611616961958}
当我希望它在结果中将所有三行(Morgan.Freeman、employee、newhire)作为单独的行返回时,如下所示:
unnested
---------------------------------------------------------
{action=cast_role, actor=Morgan.Freeman, special_notes=null, timestamp=1611616961958}
---------------------------------------------------------
{action=note_create, actor=employee@example.com, special_notes=null, timestamp=1611617308492}
---------------------------------------------------------
{action=dismissed, actor=newhire@example.com, special_notes=NA, timestamp=1611617308512}
关于我如何做到这一点的任何想法?
解决方案
根据此处的文档,这是正确的模式:
WITH dataset AS (
SELECT ARRAY[
CAST(ROW('Bob', 38) AS ROW(name VARCHAR, age INTEGER)),
CAST(ROW('Alice', 35) AS ROW(name VARCHAR, age INTEGER)),
CAST(ROW('Jane', 27) AS ROW(name VARCHAR, age INTEGER))
] AS users
)
SELECT * FROM dataset
返回
+-----------------------------------------------------------------+
| users |
+-----------------------------------------------------------------+
| [{NAME=Bob, AGE=38}, {NAME=Alice, AGE=35}, {NAME=Jane, AGE=27}] |
+-----------------------------------------------------------------+
对于 UNNEST,这将是:
WITH dataset AS (
SELECT ARRAY[
CAST(ROW('Bob', 38) AS ROW(name VARCHAR, age INTEGER)),
CAST(ROW('Alice', 35) AS ROW(name VARCHAR, age INTEGER)),
CAST(ROW('Jane', 27) AS ROW(name VARCHAR, age INTEGER))
] AS users
)
SELECT unnested
FROM dataset, UNNEST(users) t(unnested)
哪个应该返回
+---------------------+
| unnested |
+---------------------+
| {NAME=Bob, AGE=38} |
| {NAME=Alice, AGE=35}|
| {NAME=Jane, AGE=27} |
+---------------------+
在你的情况下
WITH dataset AS (
SELECT my_array_row_column AS movieDetail
from athena.movies
)
SELECT unnested
FROM dataset, UNNEST(movieDetail) AS t(unnested)
或类似的东西。我认为您的交叉联接是不必要的,因为您没有在与 athena.movies 表的键相同的粒度上引入任何字段,因此没有什么可乘的。