sql - BigQuery SQL : Left Join with aggregate conditions
问题描述
I have two Tables like : Table1:
Time1 Item1
2021-01-28 18:05:34.647 UTC ysys-gift-card
2021-01-28 18:06:29.514 UTC ysys-gift-card
2021-01-28 18:07:36.959 UTC ysys-gift-card
2021-01-29 04:20:10.916 UTC sweetheart-flounce-top
2021-01-31 10:20:36.048 UTC long-sleeve-scallop-mini-dress
2021-02-04 01:04:50.926 UTC floral-smocked-off-shoulder-top
2021-02-03 07:08:51.386 UTC a-woven-striped-accents
2021-01-29 22:19:14.359 UTC mock-neck-lace-bodysuit
2021-01-28 01:41:30.994 UTC floral-mini-dress
2021-02-03 06:03:45.649 UTC texture-stud-earring
Table2 like:
recordID Time2 Item2
3121 2021-02-07 12:21:18.808335 UTC the-zachary-short-circuit
4658 2021-02-07 12:24:23.323713 UTC pullover-mask-hoodie-1
3420 2021-02-07 12:32:57.886101 UTC ladies-6-pack-low-cut-socks-1
1190 2021-02-07 12:37:51.026572 UTC envelope-stitch-crossbody-bag
4151 2021-02-07 12:42:25.424731 UTC striped-cable-knit-sweater
1651 2021-02-07 12:45:09.691259 UTC classic-pullover-hoodie
4719 2021-02-07 12:53:25.069348 UTC rash-vest-color-splash
1314 2021-02-07 12:57:09.828512 UTC sonia-top-in-mustard-yelllow
1186 2021-02-07 12:59:17.894899 UTC thermal-heather-sleep-shirt-black-ptlfa020rt-018
3936 2021-02-07 12:59:37.462007 UTC k2013085blouse
3089 2021-02-07 13:06:16.404185 UTC product-394
1706 2021-02-07 13:09:07.584634 UTC ryan-portfolio-dress-shoe-brown-st14028brt-201
1614 2021-02-07 13:10:02.506685 UTC large-rectangular-metal-sunglasses
4618 2021-02-07 13:23:23.639555 UTC diamond-quilted-jacket-2
4869 2021-02-07 13:25:50.409423 UTC princess-panel-polo-dress-2
each table several million rows... So I need to make the left join with conditions: Item1=Item2 and time from a second table closest to the time from a first table Up to 2 minutes and if not then null
In other words, I need to find a record_ID of the same item and the closest time, but no further than 2 minutes ...
So my try like:
Select t1.time1, t1.item1
from table 1 t1 left join table2 t2 ON t1.item=t2.item
and
t1.time= ??
Here I need help to understand how to write an aggregate condition... Output like :
Time 1 Item 1 Record_ID
2021-01-28 18:06:29.514 UTC ysys-gift-card 3398
2021-01-28 18:07:36.959 UTC ysys-gift-card null
2021-01-29 04:20:10.916 UTC sweetheart-flounce-top 2275
2021-01-31 10:20:36.048 UTC long-sleeve-scallop-mini-dress 2307
2021-02-04 01:04:50.926 UTC floral-smocked-off-shoulder-top null
2021-02-03 07:08:51.386 UTC a-woven-striped-accents 1689
2021-01-29 22:19:14.359 UTC mock-neck-lace-bodysuit 4501
2021-01-28 01:41:30.994 UTC floral-mini-dress null
2021-02-03 06:03:45.649 UTC texture-stud-earring 4660
2021-02-04 01:14:08.169 UTC tie-dye-printed-top-and-pants-set 4617
2021-01-28 01:41:30.462 UTC solid-bodycon-mini-dress null
2021-01-27 11:32:56.490 UTC techno-crepe-stretch 2983
2021-02-04 06:20:18.840 UTC sweater-top-and-mini-skirt-set 1948
2021-02-01 12:36:00.932 UTC floral-race-back-flare-skirt 2932
2021-01-31 06:16:57.916 UTC workout-long-pants null
2021-01-29 22:08:06.729 UTC pre-damaged-color-block-tee 2329
Thanks
解决方案
下面应该可以工作(BigQuery 标准 SQL)!
select any_value(t1).*,
array_agg(recordID order by abs(timestamp_diff(Time1, Time2, second)) limit 1) recordID
from `project.dataset.table1` t1
join `project.dataset.table2` t2
on Item1 = Item2
group by format('%t', t1)
union all
select *, null
from `project.dataset.table1`
where not Item1 in (
select distinct Item2
from `project.dataset.table2`
)
推荐阅读
- ios - React native ios release build 不起作用,它说“Module AppRegistry 不是注册的可调用模块”
- jar - 如何加载一个外部 jar,该 jar 具有在 Wildfly 中部署的战争中实现接口的类
- java - Elasticsearch Performance Extreme 在非常长的字符串字段下
- ibm-cloud-private - 无法登录 CAM UI
- redirect - 在外部文件中重写规则 nginx
- python - 我将如何在 mongoDB 中存储相关的表格数据?
- python - Unix 上的 Python :: os.EX_NOTFOUND 的值
- scala - 使用 Scala 以优化的方式将 Spark 数据帧的 Not null 值替换为“1”
- google-maps - 谷歌距离矩阵 API。如何访问替代路线?
- javascript - 在进度条上激活下一步