首页 > 解决方案 > 在 hive/sql 中使用延迟函数时列中的默认值

问题描述

我有一张像下面这样的表格Hive

我想计算相同列的时间差并seconds获取列中id的值time_diff

Table

+-----+---------+------------------------+
| id  |  event  |            eventdate   |
+-----+---------+------------------------+
| 1   | sent    | 2017-11-23 03:49:50.0  |
| 1   | sent    | 2017-11-23 03:49:59.0  |
| 2   | sent    | 2017-11-23 04:49:59.0  |
| 1   | click   | 2017-11-24 03:49:50.0  |
+-----+---------+------------------------+

我做了如下

SELECT *, coalesce(unix_timestamp(eventdate) - unix_timestamp(LAG(eventdate) OVER(PARTITION BY ID ORDER BY eventdate)),0) time_diff FROM Table;

Result

+-----+---------+------------------------+-----------+
| id  |  event  |            eventdate   |time_diff  |
+-----+---------+------------------------+-----------+
| 1   | sent    | 2017-11-23 03:49:50.0  | 0         |
| 1   | sent    | 2017-11-23 03:49:59.0  | 9         |
| 2   | sent    | 2017-11-23 04:49:59.0  | 0         |
| 1   | click   | 2017-11-24 03:49:50.0  | 86391     |
+-----+---------+------------------------+-----------+

我得到了我想要的,但有一个小例外。结果 where idis1eventissenttime_diff列中有两个值09。在我们应用滞后函数后,我希望所有sent事件都包含0在列中。time_diff

Expected result

+-----+---------+------------------------+-----------+
| id  |  event  |            eventdate   |time_diff  |
+-----+---------+------------------------+-----------+
| 1   | sent    | 2017-11-23 03:49:50.0  | 0         |
| 1   | sent    | 2017-11-23 03:49:59.0  | 0         |
| 2   | sent    | 2017-11-23 04:49:59.0  | 0         |
| 1   | click   | 2017-11-24 03:49:50.0  | 86391     |
+-----+---------+------------------------+-----------+

我怎样才能得到预期的结果?

标签: sqlhivehiveql

解决方案


您可以使用case表达式:

SELECT *,
       (case when event = 'sent' then 0
             else coalesce(unix_timestamp(eventdate) - unix_timestamp(LAG(eventdate) OVER(PARTITION BY ID ORDER BY eventdate)), 0)
        end) as time_diff 
FROM Table;

推荐阅读