sql - 使用 Presto 查询 Hive 表时,当该列不存在数据时,如何返回列的值?
问题描述
我有一个 Hive 表(mytable),其中包含以下数据:
-----------------------------------------------------------
| date | device | hits | type |
-----------------------------------------------------------
| 2018-08-15 | device1 | 162684 | messages-total-hits |
| 2018-08-15 | device2 | 70689941 | messages-total-hits |
| 2018-08-15 | device3 | 58979363 | messages-total-hits |
| 2018-08-15 | device4 | 125021 | messages-total-hits |
| 2018-08-15 | device5 | 78750 | messages-total-hits |
| 2018-08-15 | device6 | 2595244 | messages-total-hits |
| 2018-08-16 | device1 | 73140 | activity-total-hits |
| 2018-08-16 | device4 | 19 | activity-total-hits |
| 2018-08-16 | device5 | 75572 | activity-total-hits |
| 2018-08-16 | device6 | 2024704 | activity-total-hits |
-----------------------------------------------------------
我需要获取特定时期内每天每台设备的总点击量,并使用以下查询来执行此操作:
SELECT
date_column,b.device,coalesce(sum(b.hits),0) as total
FROM
(SELECT
CAST(date_column AS DATE) date_column
FROM
(VALUES
(SEQUENCE(FROM_ISO8601_DATE('2018-08-14'),
FROM_ISO8601_DATE('2018-08-18'),
INTERVAL '1' DAY)
)
) AS t1(date_array)
CROSS JOIN
UNNEST(date_array) AS t2(date_column)
) as a
LEFT JOIN
(SELECT date,device,hits
FROM
mytable
WHERE
date BETWEEN date('2018-08-14') AND date('2018-08-18')
) as b
ON a.date_column = b.date
LEFT JOIN
(SELECT distinct(device) FROM mytable) as c
on b.device = c.device
WHERE
date_column BETWEEN date('2018-08-14') AND date('2018-08-18')
GROUP BY
date_column,
c.device,
b.device
ORDER BY
date_column,
device
;
此查询产生以下结果:
------------------------------------
| date_column | device | total |
------------------------------------
| 2018-08-14 | null | 0 |
| 2018-08-15 | device1 | 162684 |
| 2018-08-15 | device2 | 70689941 |
| 2018-08-15 | device3 | 58979363 |
| 2018-08-15 | device4 | 125021 |
| 2018-08-15 | device5 | 78750 |
| 2018-08-15 | device6 | 2595244 |
| 2018-08-16 | device1 | 73140 |
| 2018-08-16 | device4 | 19 |
| 2018-08-16 | device5 | 75572 |
| 2018-08-16 | device6 | 2024704 |
| 2018-08-17 | null | 0 |
------------------------------------
问题是如果特定设备在特定日期不存在数据,我需要显示设备名称和 0 总数。我不明白为什么我的查询没有产生我想要的结果,如下所示:
------------------------------------
| date_column | device | total |
------------------------------------
| 2018-08-14 | device1 | 0 |
| 2018-08-14 | device2 | 0 |
| 2018-08-14 | device3 | 0 |
| 2018-08-14 | device4 | 0 |
| 2018-08-14 | device5 | 0 |
| 2018-08-14 | device6 | 0 |
| 2018-08-15 | device1 | 162684 |
| 2018-08-15 | device2 | 70689941 |
| 2018-08-15 | device3 | 58979363 |
| 2018-08-15 | device4 | 125021 |
| 2018-08-15 | device5 | 78750 |
| 2018-08-15 | device6 | 2595244 |
| 2018-08-16 | device1 | 73140 |
| 2018-08-16 | device2 | 0 |
| 2018-08-16 | device3 | 0 |
| 2018-08-16 | device4 | 19 |
| 2018-08-16 | device5 | 75572 |
| 2018-08-16 | device6 | 2024704 |
| 2018-08-17 | device1 | 0 |
| 2018-08-17 | device2 | 0 |
| 2018-08-17 | device3 | 0 |
| 2018-08-17 | device4 | 0 |
| 2018-08-17 | device5 | 0 |
| 2018-08-17 | device6 | 0 |
------------------------------------
任何人都可以解释为什么我的查询在给定日期不存在设备数据时不会生成总数为 0 的设备名称?
解决方案
您必须cross join
区分带有日期的设备,然后left join
是原始表格。以下查询应返回预期结果。
SELECT
a.date_column,d.device,coalesce(sum(b.hits),0) as total
FROM
(SELECT
CAST(date_column AS DATE) date_column
FROM
(VALUES
(SEQUENCE(FROM_ISO8601_DATE('2018-08-14'),
FROM_ISO8601_DATE('2018-08-18'),
INTERVAL '1' DAY)
)
) AS t1(date_array)
CROSS JOIN UNNEST(date_array) AS t2(date_column)
) as a
CROSS JOIN (SELECT distinct device FROM mytable) as d
LEFT JOIN
(SELECT date,device,hits
FROM mytable
WHERE date BETWEEN date('2018-08-14') AND date('2018-08-18')
) as b ON a.date_column = b.date and b.device = d.device
GROUP BY a.date_column,d.device
ORDER BY a.date_column,d.device
;
推荐阅读
- laravel - Laravel 6 中多用户应用程序的不同用户创建的跟踪记录
- javascript - ReactJS event.preventDefault(); 不是函数
- javascript - 从 React 中的渲染组件中提取元素数组
- python - 一个基类的方法调用同一个基类中定义的另一个方法,最终可能会调用一个派生类的方法来覆盖它
- sql - Knex/反对关系“一对多”
- python - 3D 图和 3D 直方图子图
- typescript - Lodash:如何按对象数组分组?
- python - 如何在 Tkinter 中随机定位标签中的文本
- angular - 从 Web API 保存/加载时是否需要在组件数据和表单数据之间手动复制?
- javascript - WebGL:为什么我得到的是细长的矩形而不是正方形?