首页 > 解决方案 > BigQuery SQL : Left Join with aggregate conditions

问题描述

I have two Tables like : Table1:

Time1                          Item1
2021-01-28 18:05:34.647 UTC     ysys-gift-card
2021-01-28 18:06:29.514 UTC     ysys-gift-card
2021-01-28 18:07:36.959 UTC     ysys-gift-card
2021-01-29 04:20:10.916 UTC     sweetheart-flounce-top
2021-01-31 10:20:36.048 UTC     long-sleeve-scallop-mini-dress
2021-02-04 01:04:50.926 UTC     floral-smocked-off-shoulder-top
2021-02-03 07:08:51.386 UTC     a-woven-striped-accents
2021-01-29 22:19:14.359 UTC     mock-neck-lace-bodysuit
2021-01-28 01:41:30.994 UTC     floral-mini-dress
2021-02-03 06:03:45.649 UTC     texture-stud-earring

Table2 like:

recordID    Time2                             Item2
3121    2021-02-07 12:21:18.808335 UTC  the-zachary-short-circuit
4658    2021-02-07 12:24:23.323713 UTC  pullover-mask-hoodie-1
3420    2021-02-07 12:32:57.886101 UTC  ladies-6-pack-low-cut-socks-1
1190    2021-02-07 12:37:51.026572 UTC  envelope-stitch-crossbody-bag
4151    2021-02-07 12:42:25.424731 UTC  striped-cable-knit-sweater
1651    2021-02-07 12:45:09.691259 UTC  classic-pullover-hoodie
4719    2021-02-07 12:53:25.069348 UTC  rash-vest-color-splash
1314    2021-02-07 12:57:09.828512 UTC  sonia-top-in-mustard-yelllow
1186    2021-02-07 12:59:17.894899 UTC  thermal-heather-sleep-shirt-black-ptlfa020rt-018
3936    2021-02-07 12:59:37.462007 UTC  k2013085blouse
3089    2021-02-07 13:06:16.404185 UTC  product-394
1706    2021-02-07 13:09:07.584634 UTC  ryan-portfolio-dress-shoe-brown-st14028brt-201
1614    2021-02-07 13:10:02.506685 UTC  large-rectangular-metal-sunglasses
4618    2021-02-07 13:23:23.639555 UTC  diamond-quilted-jacket-2
4869    2021-02-07 13:25:50.409423 UTC  princess-panel-polo-dress-2

each table several million rows... So I need to make the left join with conditions: Item1=Item2 and time from a second table closest to the time from a first table Up to 2 minutes and if not then null

In other words, I need to find a record_ID of the same item and the closest time, but no further than 2 minutes ...

So my try like:

Select t1.time1, t1.item1
from table 1 t1 left join table2 t2 ON t1.item=t2.item
and
t1.time= ??

Here I need help to understand how to write an aggregate condition... Output like :

Time 1                         Item 1                        Record_ID
2021-01-28 18:06:29.514 UTC ysys-gift-card                      3398
2021-01-28 18:07:36.959 UTC ysys-gift-card                      null
2021-01-29 04:20:10.916 UTC sweetheart-flounce-top              2275
2021-01-31 10:20:36.048 UTC long-sleeve-scallop-mini-dress      2307
2021-02-04 01:04:50.926 UTC floral-smocked-off-shoulder-top     null
2021-02-03 07:08:51.386 UTC a-woven-striped-accents             1689
2021-01-29 22:19:14.359 UTC mock-neck-lace-bodysuit             4501
2021-01-28 01:41:30.994 UTC floral-mini-dress                   null
2021-02-03 06:03:45.649 UTC texture-stud-earring                4660
2021-02-04 01:14:08.169 UTC tie-dye-printed-top-and-pants-set   4617
2021-01-28 01:41:30.462 UTC solid-bodycon-mini-dress            null
2021-01-27 11:32:56.490 UTC techno-crepe-stretch                2983
2021-02-04 06:20:18.840 UTC sweater-top-and-mini-skirt-set      1948
2021-02-01 12:36:00.932 UTC floral-race-back-flare-skirt        2932
2021-01-31 06:16:57.916 UTC workout-long-pants                  null
2021-01-29 22:08:06.729 UTC pre-damaged-color-block-tee         2329

Thanks

标签: sqlgoogle-bigquery

解决方案


下面应该可以工作(BigQuery 标准 SQL)!

select any_value(t1).*, 
    array_agg(recordID order by abs(timestamp_diff(Time1, Time2, second)) limit 1) recordID
from `project.dataset.table1` t1
join `project.dataset.table2` t2
on Item1 = Item2
group by format('%t', t1)
union all 
select *, null
from `project.dataset.table1`
where not Item1 in (
  select distinct Item2
  from `project.dataset.table2`
)

推荐阅读