首页 > 解决方案 > Hive 根据日期范围汇总表中的数据

问题描述

有一个具有以下架构设计的表,其中驻留的数据如下:

ID HITS MISS DDATE
1 10    3    20180101
1 33    21   20180122
1 84    11   20180901
1 11    2    20180405
1 54    23   20190203
1 33    43   20190102
4 54    22   20170305
4 56    88   20180115
5 87    22   20180809
5 66    48   20180617
5 91    53   20170606

DataTypes:

ID    INT
HITS  INT 
MISS  INT
DDATE STRING

要求是每年计算给定 (HITS和) 的总数,即, , ...MISS201720182019

编写以下查询:

SELECT ID,
SUM(HITS) AS HITS,SUM(MISS) AS MISS,
CASE 
    WHEN DDATE BETWEEN '201701' AND '201712' THEN '2017' ELSE 
    'NOTHING' END AS TTL_YR17_DATA
CASE 
    WHEN DDATE BETWEEN '201801' AND '201812' THEN '2018' ELSE 
    'NOTHING' END AS TTL_YR18_DATA
CASE 
    WHEN DDATE BETWEEN '201901' AND '201912' THEN '2019' ELSE 
    'NOTHING' END AS TTL_YR19_DATA
FROM 
     HST_TABLE
WHERE 
     DDATE BETWEEN '201801' AND '201812'
GROUP BY 
     ID,DDATE; 

但是,查询没有获取预期的结果。

Actual O/P:

1 10   3    2018
1 33   21   2018
1 84   11   2018
1 11   2    2018
1 54   23   2019
1 33   43   2019
4 54   22   2017
4 56   88   2018
5 87   22   2018
5 66   48   2018
5 91   53   2017

Expected O/P:

1   138 37  2018
4   56  88  2018
5   153 70  2018
1   87  66  2019
5   91  53  2017

Another related question:

有没有办法可以避免在查询中传递DDATE范围?因为这应该由用户给出,而不是硬编码。

实现上述两个要求的任何帮助/建议都将非常有帮助。

标签: sqldatedatetimehivehql

解决方案


好的,用HIVE中的函数很容易实现substring,如下:

select
    substring(dddate,0,4) as the_year,
    id,
    sum(hits) as hits_num,
    sum(miss) as miss_num
from
    hst_table
group by
    substring(dddate,0,4),
    id
order by
    the_year,
    id

推荐阅读