首页 > 解决方案 > 需要帮助了解为什么多个左连接没有在雪花中返回

问题描述

多个左连接存在一些问题,没有按照我的预期去做!

select 
    sent.id,
    sent.ts,
    sent.email,
    delivered.ts,
    type.label,
    min(opens.ts) as first_open,
    count(opens.id) as open_count,
    min(clicks.ts) as first_click,
    count(clicks.id) as click_count
from sent
inner join type on type.id = sent.type_id
left outer join delivered on (delivered.id = sent.id)
left outer join opens on (opens.id = sent.id)
left outer join clicks on (clicks.id = sent.id)
where sent.id = 'a1b1c1d1e1'
group by 
    sent.id,
    sent.ts,
    sent.email,
    delivered.ts,
    type.label,
    opens.id,
    clicks.id
;

发送消息,然后传递;那是 1 比 1,但是,交付可能不存在。

然后可以打开(多次)和单击(多次)消息,所有这些都与 send.id 绑定在一起。

如果我只有打开连接,它可以正常工作,同样如果我只有点击连接。

当我添加点击时加入first_clickclick_count显示与打开相同的值。

我得到:

1,2020-01-01 00:00:00,a@b.com,2020-01-01 00:00:00,test,2020-01-01 01:00:00,4,2020-01-01 01:00:00,4

什么时候应该:

1,2020-01-01 00:00:00,a@b.com,2020-01-01 00:00:00,test,2020-01-01 01:00:00,4,2020-01-01 02:00:00,1

我尝试在没有查询缓存的ALTER SESSION SET USE_CACHED_RESULT = false;情况下运行(

标签: snowflake-cloud-data-platform

解决方案


所以试图弥合你的问题描述和你提到的结果之间的差距

从已知数据开始

create or replace table sent (id text, ts timestamp_ntz, email text, type_id number);
create or replace table type (id number, label text);
create or replace table delivered(id text, ts timestamp_ntz);
create or replace table opens(id text, ts timestamp_ntz);
create or replace table clicks(id text, ts timestamp_ntz);

insert into sent values ('a1b1c1d1e1', '2020-01-01 01:00', 'a@b.com', 1);
insert into delivered values ('a1b1c1d1e1', '2020-01-01 02:00');
insert into type values (1, 'test');
insert into opens values ('a1b1c1d1e1', '2020-01-01 03:00'),('a1b1c1d1e1', '2020-01-01 04:00'),('a1b1c1d1e1', '2020-01-01 05:00'),('a1b1c1d1e1', '2020-01-01 06:00');
insert into clicks values ('a1b1c1d1e1', '2020-01-01 07:00');

select 
    sent.id
    ,sent.ts
    ,sent.email
    ,delivered.ts
    ,type.label
    ,min(opens.ts) as first_open
    ,count(opens.id) as open_count
    ,min(clicks.ts) as first_click
    ,count(clicks.id) as click_count
from sent
join type on type.id = sent.type_id
left join delivered on (delivered.id = sent.id)
left join opens on (opens.id = sent.id)
left join clicks on (clicks.id = sent.id)
where sent.id = 'a1b1c1d1e1'
group by 1,2,3,4, 5;

我将列名称交换到它们的位置,因为我喜欢这样,但你不需要opens.id或者clicks.id因为那些不是在非聚合列中选择的。

 ID TS  EMAIL   TS  LABEL   FIRST_OPEN  OPEN_COUNT  FIRST_CLICK CLICK_COUNT
 a1b1c1d1e1 2020-01-01 01:00:00.000 a@b.com 2020-01-01 02:00:00.000 test    2020-01-01 03:00:00.000 4   2020-01-01 07:00:00.000 4

我不确定你正在改变什么加入行为..但打印所有行并查看发生了什么以了解为什么你没有得到你期望的结果可能会有所帮助。

select 
    sent.id
    ,sent.ts
    ,sent.email
    ,delivered.ts
    ,type.label
    ,opens.ts as open_ts
    ,clicks.ts as click_ts
    --,min(opens.ts) as first_open
    --,count(opens.id) as open_count
    --,min(clicks.ts) as first_click
    --,count(clicks.id) as click_count
from sent
join type on type.id = sent.type_id
left join delivered on (delivered.id = sent.id)
left join opens on (opens.id = sent.id)
left join clicks on (clicks.id = sent.id)
where sent.id = 'a1b1c1d1e1'
--group by 1,2,3,4, 5;

给我:

 ID TS  EMAIL   TS  LABEL   OPEN_TS CLICK_TS
 a1b1c1d1e1 2020-01-01 01:00:00.000 a@b.com 2020-01-01 02:00:00.000 test    2020-01-01 03:00:00.000 2020-01-01 07:00:00.000
 a1b1c1d1e1 2020-01-01 01:00:00.000 a@b.com 2020-01-01 02:00:00.000 test    2020-01-01 04:00:00.000 2020-01-01 07:00:00.000
 a1b1c1d1e1 2020-01-01 01:00:00.000 a@b.com 2020-01-01 02:00:00.000 test    2020-01-01 05:00:00.000 2020-01-01 07:00:00.000
 a1b1c1d1e1 2020-01-01 01:00:00.000 a@b.com 2020-01-01 02:00:00.000 test    2020-01-01 06:00:00.000 2020-01-01 07:00:00.000

这是我对 ether LEFT 或普通 INNER 连接的期望。随时使用 SQL 更新,为您提供损坏的结果,以及上面列出的输出版本,以获得更好的解释。


推荐阅读