首页 > 解决方案 > 如何在 Hive 或 Presto 中将以下字典格式列转换为不同的格式?

问题描述

我有一个 Hive 表,如下所示:

事件名称 参加者每国家
一个 {'美国':5}
b {'美国':4,'英国':3,'CA':2}
C {'英国':2,'CA':1}

我想得到一个如下所示的新表:

国家 number_of_people
我们 9
英国 5
加州 4

如何在 Hive 或 Presto 中编写查询?

标签: sqlhivepresto

解决方案


您可以使用以下内容:

如果列类型为attendees_per_countries字符串,则可以使用以下内容:

WITH sample_data AS (
    select 
        event_name, 
        str_to_map(
            regexp_replace(attendees_per_countries,'[{|}]',''),
            ',',
            ':'
        ) as attendees_per_countries 
    FROM
        raw_data
        
)
select 
    regexp_replace(cm.key,"[' ]","") as country,
    SUM(cm.value) as no_of_people
from sample_data
lateral view explode(attendees_per_countries) cm
GROUP BY regexp_replace(cm.key,"[' ]","")
ORDER BY no_of_people DESC

但是,如果列类型attendees_per_countries已经是 amap那么您可以使用以下

select 
    regexp_replace(cm.key,"[' ]","") as country,
    SUM(cm.value) as no_of_people
from sample_data
lateral view explode(attendees_per_countries) cm
GROUP BY regexp_replace(cm.key,"[' ]","")
ORDER BY no_of_people DESC

下面的完整可重现示例

with raw_data AS (
    select 'a' as event_name, "{'US':5}" as attendees_per_countries
    UNION ALL 
    select 'b', "{'US':4, 'UK': 3, 'CA': 2}"
    UNION ALL 
    select 'c', "{'UK':2, 'CA': 1}"
),
sample_data AS (
    select 
        event_name, 
        str_to_map(
            regexp_replace(attendees_per_countries,'[{}]',''),
            ',',
            ':'
        ) as attendees_per_countries 
    FROM
        raw_data
        
)
select 
    regexp_replace(cm.key,"[' ]","") as country,
    SUM(cm.value) as no_of_people
from sample_data
lateral view explode(attendees_per_countries) cm
GROUP BY regexp_replace(cm.key,"[' ]","")
ORDER BY no_of_people DESC

让我知道这是否适合您


推荐阅读