首页 > 解决方案 > 仅当第一个表中的记录数大于或等于第二个表中的记录数时,我如何才能使左连接起作用?

问题描述

我有两个表,我想在它们之间进行左连接,我希望只有在第一个表中包含相同连接列值的记录数大于或等于包含第二个表中连接列的相同值

我尝试什么:

首先我计算每组的记录数

然后我把条件过滤(a.cnt >= b.cnt

这是代码:

insert into work.discount_tmp 
select SubsID, MSISDN, EppAcc, User_Name, Bill_Cycle, Tariff, Pack, Discount_Qual_ID, 
Discount_ID, Qualification_Type, Discount_Desc, Sequence, a.GroupID, RuleID, 
dib_band_id, dib_band_end, dib_charge_ref, DIB_DIS0, dib_disc_type, dib_limit_disc, 
DIB_MAX_, cd_class_type, ClassID, Class, dgh_group_id, dgh_inclusion_from, 
dgh_inclusion_to, 20191003 
from (
  (
    select *, 
      row_number() over (partition by GroupID order by Discount_ID) as seqnum,
      COUNT(*) over (partition by GroupID order by GroupID) as cnt 
    from work.disc_band 
    where tbl_dt = 20191003 
    order by Discount_ID
  ) a
  left join (
    select *, 
      row_number() over (
        partition by GroupID 
        order by cd_class_type, try(cast(coalesce(classid,'0') as bigint))
      ) as seqnum,
      count(*) over (partition by GroupID order by GroupID) as cnt 
    from work.alltable1 
    where tbl_dt = 20191003 
  ) b on a.GroupID = b.GroupID and a.cnt >= b.cnt and a.seqnum = b.seqnum
); 

但是我的尝试不起作用,因为先完成连接然后再完成条件(因此cnt第二个表中的值在连接完成后不会保持不变)

知道如何使它工作吗?

标签: sqlpresto

解决方案


您想在FROM子句中编写引用中性“种子表”的查询。然后您可以计算这两个表的行数并将它们相互比较。像这样的东西:

insert into work.discount_tmp 
select SubsID, MSISDN, EppAcc, User_Name, Bill_Cycle, Tariff, Pack, Discount_Qual_ID, 
Discount_ID, Qualification_Type, Discount_Desc, Sequence, a.GroupID, RuleID, 
dib_band_id, dib_band_end, dib_charge_ref, DIB_DIS0, dib_disc_type, dib_limit_disc, 
DIB_MAX_, cd_class_type, ClassID, Class, dgh_group_id, dgh_inclusion_from, 
dgh_inclusion_to, 20191003 
from (
  select a.*, b.*, 
  FROM (SELECT 1 AS DummyCol) AS dt -- "seed" table; not sure the Presto equivalent
  CROSS JOIN ( -- Get all qualifying rows from "disc_band"
    SELECT *, count(*) OVER(PARTITION BY groupid) AS RowCount
    FROM work.disc_band
    WHERE tbl_dt = 20191003
  ) a
  LEFT JOIN ( -- Get all qualifying rows from "alltable1"
    SELECT *, count(*) OVER(PARTITION BY groupid) AS RowCount
    FROM work.alltable1
    WHERE tbl_dt = 20191003
  ) b ON a.groupid = b.groupid AND a.RowCount >= b.RowCount
) src

我不确定seqnum逻辑如何,因此您可以根据需要将其重新添加。

我尚未对其进行测试,您可能需要修改语法以使其与Presto. 试一试,让我知道。


推荐阅读