首页 > 解决方案 > SQL 中有没有一种方法可以跨行聚合列,并可能基于 Redshift 中的另一个字段值重复行?

问题描述

所以我有一个表,我们称之为 shipping_items,它按 shipping_id 列出装运中包含的各个项目及其数量。

+-------------+-------------+----------+
| shipment_id |  item_name  | quantity |
+-------------+-------------+----------+
|           1 | cleanser    |        1 |
|           1 | moisturizer |        2 |
|           2 | cleanser    |        2 |
|           2 | body wash   |        1 |
|           3 | cleanser    |        1 |
|           3 | moisturizer |        2 |
|           4 | cleanser    |        1 |
|           4 | moisturizer |        1 |
+-------------+-------------+----------+

我想要的是返回一个看起来像这样的表

+------------------------------------+----------+
|               items                | num_ship |
+------------------------------------+----------+
| cleanser, moisturizer, moisturizer |        2 |
| body wash, cleanser, cleanser      |        1 |
| cleanser, moisturizer              |        1 |
+------------------------------------+----------+

sql中有没有办法做到这一点?我正在考虑 list_agg,但棘手的部分是根据数量字段复制 item_names。我想在新表中显示的是,有 2 批货物包含 2 种保湿剂和 1 种清洁剂,1 批货物包含 2 种清洁剂和 1 种沐浴露。

** 编辑 ** 感谢@Gordon Linoff 解决

新的结果表将如下所示

+------------------------------------+----------+
|               items                | num_ship |
+------------------------------------+----------+
| cleanser: 1, moisturizer: 2        |        2 |
| body wash: 1, cleanser: 2          |        1 |
| cleanser: 1, moisturizer: 1        |        1 |

标签: sqlamazon-redshift

解决方案


您可以使用listagg()

select listagg(item_name, ', ') within group (order by item_name) as items,
       quantity
from t
group by quantity
order by quantity desc;

编辑:

我认为您需要两个级别的聚合:

select items, count(*)
from (select shipment_id,
             listagg(distinct item_name, ', ') within group (order by item_name) as items
      from t
      group by shipment_id
     ) s
group by items
order by count(*) desc;

这不包括项目列表中的重复项。

编辑二:

对于完全匹配,包括数量:

select items, count(*)
from (select shipment_id,
             listagg(distinct item_name || ':' || quantity, ', ') within group (order by item_name) as items
      from t
      group by shipment_id
     ) s
group by items
order by count(*) desc;

推荐阅读