首页 > 解决方案 > 如何优化多个左连接 SQL SELECT 查询?

问题描述

情况:

我们有一个数据库“base1”~600万行数据,显示了实际客户购买和购买日期+本次购买的参数。

CREATE TABLE base1 (
User_id NOT NULL PRIMARY KEY ,
PurchaseDate date,
Parameter1 int,
Parameter2 int,
...
ParameterK int );

还有另一个数据库“base2”~ 9000 万行数据,它实际上显示了同样的事情,但不是购买日期,而是使用每周部分(例如:每个客户 4 年的所有周 - 如果有N周没有购买,客户仍然显示)。

CREATE TABLE base2 (
Users_id NOT NULL PRIMARY KEY ,
Week_start date ,
Week_end date,
Parameter1 int,
Parameter2 int,
...
ParameterN int );

执行以下查询的任务:

-- a = base1 , b , wb%% = base2
--create index idx_uid_purch_date on base1(Users_ID,Purchasedate);
SELECT a.Users_id
-- Checking whether the client will make a purchase in next week and the purchase will be bought on condition
,iif(b.Users_id is not null,1,0) as User_will_buy_next_week
,iif(b.Users_id is not null and b.Parameter1 = 1,1,0) as User_will_buy_on_Condition1
--   about 12 similar iif-conditions
,iif(b.Users_id is not null and (b.Parameter1 = 1 and b.Parameter12 = 1),1,0) 
as User_will_buy_on_Condition13

-- checking on the fact of purchase in the past month, 2 months ago, 2.5 months, etc.
,iif(wb1m.Users_id is null,0,1) as was_buy_1_month_ago
,iif(wb2m.Users_id is null,0,1) as was_buy_2_month_ago
,iif(wb25m.Users_id is null,0,1) as was_buy_25_month_ago
,iif(wb3m.Users_id is null,0,1) as was_buy_3_month_ago
,iif(wb6m.Users_id is null,0,1) as was_buy_6_month_ago
,iif(wb1y.Users_id is null,0,1) as was_buy_1_year_ago

 ,a.[Week_start]
 ,a.[Week_end]

 into base3
 FROM base2 a 

 -- Join for User_will_buy
 left join base1 b
 on a.Users_id =b.Users_id and 
 cast(b.[PurchaseDate] as date)>=DATEADD(dd,7,cast(a.[Week_end] as date)) 
 and cast(b.[PurchaseDate] as date)<=DATEADD(dd,14,cast(a.[Week_end] as date))

 -- Joins for was_buy
 left join base1  wb1m
 on a.Users_id =wb1m.Users_id 
 and cast(wb1m.[PurchaseDate] as date)>=DATEADD(dd,-30-4,cast(a.[Week_end] as date)) 
 and cast(wb1m.[PurchaseDate] as date)<=DATEADD(dd,-30+4,cast(a.[Week_end] as date))

/* 4 more similar joins where different values are added in 
DATEADD (dd, %%, cast (a. [Week_end] as date))
to check on the fact of purchase for a certain period */

 left outer join base1 wb1y
 on a.Users_id =wb1y.Users_id and 
 cast(wb1y.[PurchaseDate] as date)>=DATEADD(dd,-365-4,cast(a.[Week_end] as date)) 
 and cast(wb1y.[PurchaseDate] as date)<=DATEADD(dd,-365+5,cast(a.[Week_end] as date))

由于大量的连接和相当大的数据库——这个脚本运行了大约24 小时,这非常长。

正如执行计划所示,主要时间花在“Merge Join”上,从base1和base2查看表的行,并将数据插入另一个base3表。

问题:是否可以优化此查询使其运行得更快?

也许使用一个 Join 代替或其他东西。

请帮忙,我不够聪明:(

谢谢大家的回答!

UPD:也许使用不同类型的连接(合并、循环或散列)可能对我有帮助,但不能真正检查这个理论。也许有人可以告诉我这是对还是错;)

标签: sqlsql-serverjoinselectquery-optimization

解决方案


我假设该base1表存储有关当前一周购买的信息。

如果是这样,在连接的查询条件中,您可以忽略[PurchaseDate]参数,将其替换为当前日期常量。在这种情况下,您的DATEADD函数将应用于当前日期,并且在连接条件下将是常量:

left join base1 b
on a.Users_id =b.Users_id and 
DATEADD(day,-7,GETDATE())>=a.[Week_end] 
and DATEADD(day,-14,GETDATE())<=a.[Week_end]

要使上述查询正确运行,您应该限制b.[PurchaseDate]在当天。

然后,您可以针对昨天的购买运行另一个查询,并且DATEADD连接条件中的所有常量由-1

依此类推,最多 7 个查询,或base1表格涵盖的任何时间跨度。

您还可以实现按天对值进行分组[PurchaseDate],重新计算常量并在一个查询中完成所有这些,但我还没有准备好自己花时间创建它。:)


推荐阅读