首页 > 解决方案 > 使用 NOT IN 子句替代 Hive 查询

问题描述

我有以下一组蜂巢表:

create table image_additions (
    customer_id STRING,
    image_key STRING,
    image_size STRING
);
create table image_removals (
   customer_id STRING,
   image_key STRING,
   image_size STRING
);
create table images_stored (
   customer_id STRING,
   image_key STRING,
   image_size STRING
);

我想从如下查询中运行插入:

insert into images_stored
select ia.customer_id, 
       ia.image_key, 
       ia.image_size 
from image_additions ia 
where ia.image_key not in 
       (select ir.image_key from image_removals ir);

这会产生一个笛卡尔积,而 hive 不允许我运行它。如何使用替代查询来做到这一点?

标签: sqlhivehiveqlnotin

解决方案


使用左连接 + where is null;

insert into images_stored
select ia.customer_id, 
       ia.image_key, 
       ia.image_size 
from image_additions ia 
     left join image_removals ir on ia.image_key=ir.image_key 
where ir.image_key is null;

使用不存在:

insert into images_stored
select ia.customer_id, 
       ia.image_key, 
       ia.image_size 
from image_additions ia  
where not exists (select 1 from image_removals ir where ia.image_key=ir.image_key);

推荐阅读