sql - Hive join between two tables
问题描述
This is the problema: I've got this staging table:
key0 key1 timestamp partition_key
5 5 2020-03-03 14:42:21.548 1
5 4 2020-03-03 14:40:11.871 1
4 3 2020-03-03 14:43:47.602 2
And this target table:
key0 key1 timestamp partition_key
5 4 2020-03-03 13:43:16.695 1
5 5 2020-03-03 13:45:24.793 1
5 2 2020-03-03 13:47:30.668 1
5 1 2020-03-03 13:48:30.669 1
4 3 2020-03-03 13:53:47.602 2
43 3 2020-03-03 14:00:14.016 2
I want to get this output:
key0 key1 timestamp partition_key
5 5 2020-03-03 14:42:21.548 1
5 4 2020-03-03 14:40:11.871 1
5 2 2020-03-03 13:47:30.668 1
5 1 2020-03-03 13:48:30.669 1
4 3 2020-03-03 14:43:47.602 2
43 3 2020-03-03 14:00:14.016 2
In the timestamp field, I want the most updated record when key0, key1, and partition_key. In addition, I want already existing records in the target table but that doesn't exist in the staging table
I tried first with this query:
select
t1.key0,
t1.key1,
t1.timestamp,
t2.partition_key
from staging_table t2
left outer join target_table t1 on
t1.key0=t2.key0 AND
t1.key1=t2.key1 AND
t1.timestamp=t2.timestamp;
解决方案
This looks like a prioritization query -- take everything from staging and then unmatched rows from the target. I'm going to recommend union all
:
select s.*
from staging s
union all
select t.*
from target t left join
staging s
on t.key0 = s.key0 and t.key1 = s.key1
where s.key0 is null;
This does assume that staging has the most recent rows -- which is true in your sample data. If not, I would phrase this as:
select key0, key1, timestamp, partition_key
from (select st.*,
row_number() over (partition by key0, key1 order by timestamp desc) as seqnum
from ((select s.* from source s
) union all
(select t.* from target t
)
) st
) st
where seqnum = 1;
推荐阅读
- javascript - 中断并重新开始 D3 中的转换
- ios - AWS IOT 连接在 IPAD OS v12.1.1 上关闭
- angular - 角度2垫选择最小值小于最大值
- sql - JSON 表头格式
- spring - 禁用 Spring Boot 2 默认指标
- r - 使用 httr 为列表中的每个项目发送 POST 到 API
- dart - Flutter Streams,如何从流中访问先前发出的数据
- c# - Xamarin 表格网格定位
- java - 尝试调用虚拟方法'java.lang.String java.util.HashSet.toString()。获取空指针异常
- python - 使用 tkinter 将 txt 文件加载到列表框