amazon-redshift - 如果在连接条件中的列上应用函数,Redshift 可以利用 distkey 吗?
问题描述
假设我在它们的 distkey 上连接了两个表,但我在连接条件中的列上应用了一个函数:
select count(1)
from toto_calls
inner join toto_carriers
on substring(toto_calls.to_carrier, 1, 3) = substring(toto_carriers.carrier, 1, 3)
Redshift 会利用 distkey 并能够搭配连接吗?
解决方案
不,要让 Redshift 搭配连接并避免广播行,您需要连接位于作为 distkey 的确切列上。不是对列上的一些操作的结果。我用查询计划检查了这个:
explain (select count(1) from toto_calls inner join toto_carriers on to_carrier = carrier);
给出:
XN Aggregate (cost=18514.87..18514.87 rows=1 width=0)
-> XN Hash Join DS_DIST_NONE (cost=0.12..17562.68 rows=380875 width=0)
Hash Cond: (("outer".carrier)::text = ("inner".to_carrier)::text)
-> XN Seq Scan on toto_carriers (cost=0.00..4231.94 rows=423194 width=10)
-> XN Hash (cost=0.10..0.10 rows=9 width=10)
-> XN Seq Scan on toto_calls (cost=0.00..0.10 rows=9 width=10)
Filter: (to_carrier IS NOT NULL)
DS_DIST_NONE,所以没有广播,耶!尽管:
explain (
select count(1) from toto_calls
inner join toto_carriers on substring(toto_calls.to_carrier, 1, 3) = substring(toto_carriers.carrier, 1, 3)
);
给出:
XN Aggregate (cost=2018250.37..2018250.37 rows=1 width=0)
-> XN Hash Join DS_BCAST_INNER (cost=0.12..2018197.47 rows=21160 width=0)
Hash Cond: ("substring"(("outer".carrier)::text, 1, 3) = "substring"(("inner".to_carrier)::text, 1, 3))
-> XN Seq Scan on toto_carriers (cost=0.00..4231.94 rows=423194 width=10)
-> XN Hash (cost=0.10..0.10 rows=10 width=10)
-> XN Seq Scan on toto_calls (cost=0.00..0.10 rows=10 width=10)
DS_BCAST_INNER,所以广播,嘘...