首页 > 解决方案 > 在 HIVE 中,使用 COALESCE 将 Null 值替换为相同的列值

问题描述

我想用同一列中的值替换特定列的空值我想得到结果

我在下面尝试过

select  
    d_day,
    COALESCE(val, LAST_VALUE(val, TRUE) 
    OVER( ORDER BY d_day ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)) 
    as val from data_table

样本数据

标签: hive

解决方案


一种方法是通过两个窗口函数,这是一个示例:

with tmp_table as (
  select 1 as ts, 3 as val 
  union all
  select 2 as ts, NULL as val
  union all 
  select 3 as ts, NULL as val
  union all
  select 4 as ts, 4 as val
  union all
  select 5 as ts, NULL as val
  union all
  select 6 as ts, 5 as val
  union all 
  select 7 as ts, 6 as val
)
, rank_table as ( 
select *, SUM(val) OVER (ORDER BY ts ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) as rnk
  from tmp_table
)
select *, max(val) over (partition by rnk)
  from rank_table

所以在你的情况下

with rank_table as ( 
select *, SUM(val) OVER (ORDER BY d_day ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) as rnk
  from your_table
)
select *, max(val) over (partition by rnk)
  from rank_table

请记住,第一个ORDER BY d_day将使您的工作在单个减速器上运行,因此如果您的数据非常大,可能需要一些时间才能完成。


推荐阅读