首页 > 解决方案 > 计算 hive 表中的多列值

问题描述

以下是按类型划分的家庭成员(八人)的数据集.....

h1      h2      h3      h4      h5      h6      h7      h8
U       U       P       U       Y       null    Y       U
U       H       U       U       Y       Y       P       P
U       U       U       H       U       nuLL    Y       null
null    null    H       H       U       null    null    null
P       U       U       U       Y       null    Z       P
Y       P       null    H       Y       P       U       H
U       null    U       null    P       U       Z       Y
null    null    null    null    null    null    null    null

在上述数据集中统计总数H=户主,P=户主,U=成人,Y=户主,null=不匹配。我使用了此代码,并且此代码按类型为我提供了正确的家庭成员计数,但如果为 null,我没有得到正确的计数。谁能告诉我为什么会这样?请解决它。下面我提供我的代码

select sum( Head_cnt) as H,
       sum( parent_cnt) as P,
       sum( adult_cnt) as U,
       sum(spouce_cnt) as Y,
       sum( nomatch_cnt) as Nomatch 
from(
select length(regexp_replace(row_concatenated, '[^U]', '')) as adult_cnt,
length(regexp_replace(row_concatenated, '[^H]', '')) as head_cnt,
length(regexp_replace(row_concatenated, '[^P]', '')) as parent_cnt,
length(regexp_replace(row_concatenated, '[^Y]', '')) as spouce_cnt, 
length(regexp_replace(row_concatenated, '[null]', '')) as nomatch_cnt
from(select concat_ws(',',h1,h2,h3,h4,h5,h6,h7,h8) as row_concatenated 
       from table_name)s
)s; 

请给我解决方案......代码中的空值。我得到了除空值之外的所有值的正确计数。记住这不是一个 NULL 值。这里 null 表示不匹配。

标签: hivehiveql

解决方案


将空值转换为某个字符,例如“N”表示不匹配:

select sum( Head_cnt) as H,
       sum( parent_cnt) as P,
       sum( adult_cnt) as U,
       sum(spouce_cnt) as Y,
       sum( nomatch_cnt) as Nomatch 
from(
select length(regexp_replace(row_concatenated, '[^U]', '')) as adult_cnt,
length(regexp_replace(row_concatenated, '[^H]', '')) as head_cnt,
length(regexp_replace(row_concatenated, '[^P]', '')) as parent_cnt,
length(regexp_replace(row_concatenated, '[^Y]', '')) as spouce_cnt, 
length(regexp_replace(row_concatenated, '[^N]', '')) as nomatch_cnt
from
    (
     select concat_ws(',',nvl(h1,'N'),nvl(h2,'N'),nvl(h3,'N'),nvl(h4,'N'),nvl(h5,'N'),nvl(h6,'N'),nvl(h7,'N'),nvl(h8,'N')) as row_concatenated 
       from table_name)s
)s; 

推荐阅读