首页 > 解决方案 > Hive SQL 计数项目,如果它等于 1,则说明使用了什么项目

问题描述

我正在查询使用桌面设备或移动设备的用户,我想知道他们在一天内是否只使用了其中一种或两种设备,如果他们只使用一个类别,那是哪一个。

我编译了下面的查询,但我正在循环运行,因为在计算“dev_count”之前我不能在 GROUP BY 子句中使用“dev2”。

SELECT day, user,
COUNT(DISTINCT CASE 
  WHEN dev = "SmartPhone" OR dev = "Tablet" 
    THEN "Mobile"
  ELSE "Desktop" 
END) AS dev_count,
CASE 
  WHEN dev_count = 1
    THEN
      CASE 
        WHEN dev = "SmartPhone" OR dev = "Tablet" 
          THEN "Mobile"
        ELSE "Desktop" 
        END
  ELSE ""
END AS dev2
FROM table1
WHERE
    ...
GROUP BY day, user, dev2
ORDER BY day;

有没有办法摆脱它?

标签: sqlgroup-byhivecasehiveql

解决方案


您可以使用 collect_set() 聚合函数来获取唯一使用的设备类型/等的数组,然后您可以使用 array_contains() 检查数组包含的内容和 size() 检查它是否包含多个元素并获取数组 [0]如果它包含单个元素。

select day, user, 
       size(device_type_used_array) as dev_count,
       case when size(device_type_used_array)=1 
                then device_type_used_array[0] 
           else NULL --more than one, put what you want instead of NULL
       end as one_category_used
from
(
SELECT day, user, 
collect_set(CASE WHEN dev = "SmartPhone" OR
                      dev = "Tablet" 
                THEN "Mobile"
                ELSE "Desktop" 
           END) AS device_type_used_array
FROM table1
WHERE
    ...
GROUP BY day, user
)s
order by day;

device_type_used_array contains unique device type used per day. For not unique array use collect_list(), you can use both if necessary

推荐阅读