首页 > 解决方案 > Athena 获取每组中的最小值和对应的其他列值

问题描述

输入表

user id action  date           collection

aaa  1   view   2020-09-01     {some JSON data_1}
aaa  1   view   2020-09-02     {some JSON data_2}
aaa  1   view   2020-09-03     {some JSON data_3}
bbb  2   view   2020-09-08     {some JSON data_22}
bbb  2   view   2020-09-09     {some JSON data_23}
ccc  2   view   2020-09-01     {some JSON data_99}
ddd  3   view   2020-09-01     {some JSON data_88}

输出表

user id action  date           collection

aaa  1   view   2020-09-01     {some JSON data_1}
bbb  2   view   2020-09-08     {some JSON data_22}
ccc  2   view   2020-09-01     {some JSON data_99}
ddd  3   view   2020-09-01     {some JSON data_88}

如果我们看到输入表和输出表,

我想要类似的

group by (user,id,action) then i need min(date) and corresponding collection value

任何人都可以提出一个想法吗?

标签: sqldatetimegreatest-n-per-groupprestoamazon-athena

解决方案


一种选择是使用子查询进行过滤:

select t.*
from mytable t
where t.date = (
    select min(t1.date) from mytable t1 where t1.user = t.user
)

另一种解决方案是使用窗口函数对具有相同userby 的记录进行排名date,然后使用该信息过滤结果集:

select *
from (
    select t.*, row_number() over(partition by user order by date) rn
    from mytable t
) t
where rn = 1

推荐阅读