sql - 如何优化多个左连接 SQL SELECT 查询?
问题描述
情况:
我们有一个数据库“base1”~600万行数据,显示了实际客户购买和购买日期+本次购买的参数。
CREATE TABLE base1 (
User_id NOT NULL PRIMARY KEY ,
PurchaseDate date,
Parameter1 int,
Parameter2 int,
...
ParameterK int );
还有另一个数据库“base2”~ 9000 万行数据,它实际上显示了同样的事情,但不是购买日期,而是使用每周部分(例如:每个客户 4 年的所有周 - 如果有N周没有购买,客户仍然显示)。
CREATE TABLE base2 (
Users_id NOT NULL PRIMARY KEY ,
Week_start date ,
Week_end date,
Parameter1 int,
Parameter2 int,
...
ParameterN int );
执行以下查询的任务:
-- a = base1 , b , wb%% = base2
--create index idx_uid_purch_date on base1(Users_ID,Purchasedate);
SELECT a.Users_id
-- Checking whether the client will make a purchase in next week and the purchase will be bought on condition
,iif(b.Users_id is not null,1,0) as User_will_buy_next_week
,iif(b.Users_id is not null and b.Parameter1 = 1,1,0) as User_will_buy_on_Condition1
-- about 12 similar iif-conditions
,iif(b.Users_id is not null and (b.Parameter1 = 1 and b.Parameter12 = 1),1,0)
as User_will_buy_on_Condition13
-- checking on the fact of purchase in the past month, 2 months ago, 2.5 months, etc.
,iif(wb1m.Users_id is null,0,1) as was_buy_1_month_ago
,iif(wb2m.Users_id is null,0,1) as was_buy_2_month_ago
,iif(wb25m.Users_id is null,0,1) as was_buy_25_month_ago
,iif(wb3m.Users_id is null,0,1) as was_buy_3_month_ago
,iif(wb6m.Users_id is null,0,1) as was_buy_6_month_ago
,iif(wb1y.Users_id is null,0,1) as was_buy_1_year_ago
,a.[Week_start]
,a.[Week_end]
into base3
FROM base2 a
-- Join for User_will_buy
left join base1 b
on a.Users_id =b.Users_id and
cast(b.[PurchaseDate] as date)>=DATEADD(dd,7,cast(a.[Week_end] as date))
and cast(b.[PurchaseDate] as date)<=DATEADD(dd,14,cast(a.[Week_end] as date))
-- Joins for was_buy
left join base1 wb1m
on a.Users_id =wb1m.Users_id
and cast(wb1m.[PurchaseDate] as date)>=DATEADD(dd,-30-4,cast(a.[Week_end] as date))
and cast(wb1m.[PurchaseDate] as date)<=DATEADD(dd,-30+4,cast(a.[Week_end] as date))
/* 4 more similar joins where different values are added in
DATEADD (dd, %%, cast (a. [Week_end] as date))
to check on the fact of purchase for a certain period */
left outer join base1 wb1y
on a.Users_id =wb1y.Users_id and
cast(wb1y.[PurchaseDate] as date)>=DATEADD(dd,-365-4,cast(a.[Week_end] as date))
and cast(wb1y.[PurchaseDate] as date)<=DATEADD(dd,-365+5,cast(a.[Week_end] as date))
由于大量的连接和相当大的数据库——这个脚本运行了大约24 小时,这非常长。
正如执行计划所示,主要时间花在“Merge Join”上,从base1和base2查看表的行,并将数据插入另一个base3表。
问题:是否可以优化此查询使其运行得更快?
也许使用一个 Join 代替或其他东西。
请帮忙,我不够聪明:(
谢谢大家的回答!
UPD:也许使用不同类型的连接(合并、循环或散列)可能对我有帮助,但不能真正检查这个理论。也许有人可以告诉我这是对还是错;)
解决方案
我假设该base1
表存储有关当前一周购买的信息。
如果是这样,在连接的查询条件中,您可以忽略[PurchaseDate]
参数,将其替换为当前日期常量。在这种情况下,您的DATEADD
函数将应用于当前日期,并且在连接条件下将是常量:
left join base1 b
on a.Users_id =b.Users_id and
DATEADD(day,-7,GETDATE())>=a.[Week_end]
and DATEADD(day,-14,GETDATE())<=a.[Week_end]
要使上述查询正确运行,您应该限制b.[PurchaseDate]
在当天。
然后,您可以针对昨天的购买运行另一个查询,并且DATEADD
连接条件中的所有常量由-1
依此类推,最多 7 个查询,或base1
表格涵盖的任何时间跨度。
您还可以实现按天对值进行分组[PurchaseDate]
,重新计算常量并在一个查询中完成所有这些,但我还没有准备好自己花时间创建它。:)
推荐阅读
- javascript - 如何在 Tampermonkey 中制作脚本以自动单击
- angular - 打印时图表未填满 A4 格式的所有页面。(高图)
- go - 如何将 float64 数字截断为特定精度?
- c++ - 架构不固定时如何使用平面缓冲区?
- javascript - 如何在chartjs上强制动画?
- c++ - 括号(预期声明)
- python - 如何使用 Flask-admin 按 id 分隔视图模型?
- c# - 错误:您与此站点的连接不是私有的 Asp.Net。MVC5
- ssl - Spring Data Couchbase 通过 SSL 连接
- listview - 如何在 Flutter 中添加 ListView.builder 的填充顶部?