首页 > 解决方案 > 如果在连接条件中的列上应用函数,Redshift 可以利用 distkey 吗?

问题描述

假设我在它们的 distkey 上连接了两个表,但我在连接条件中的列上应用了一个函数:

select count(1)
from toto_calls
inner join toto_carriers
    on substring(toto_calls.to_carrier, 1, 3) = substring(toto_carriers.carrier, 1, 3)

Redshift 会利用 distkey 并能够搭配连接吗?

标签: amazon-redshift

解决方案


不,要让 Redshift 搭配连接并避免广播行,您需要连接位于作为 distkey 的确切列上。不是对列上的一些操作的结果。我用查询计划检查了这个:

explain (select count(1) from toto_calls inner join toto_carriers on to_carrier = carrier);

给出:

XN Aggregate  (cost=18514.87..18514.87 rows=1 width=0)
  ->  XN Hash Join DS_DIST_NONE  (cost=0.12..17562.68 rows=380875 width=0)
        Hash Cond: (("outer".carrier)::text = ("inner".to_carrier)::text)
        ->  XN Seq Scan on toto_carriers  (cost=0.00..4231.94 rows=423194 width=10)
        ->  XN Hash  (cost=0.10..0.10 rows=9 width=10)
              ->  XN Seq Scan on toto_calls  (cost=0.00..0.10 rows=9 width=10)
                    Filter: (to_carrier IS NOT NULL)

DS_DIST_NONE,所以没有广播,耶!尽管:

explain (
select count(1) from toto_calls
inner join toto_carriers on substring(toto_calls.to_carrier, 1, 3) = substring(toto_carriers.carrier, 1, 3)
);

给出:

XN Aggregate  (cost=2018250.37..2018250.37 rows=1 width=0)
  ->  XN Hash Join DS_BCAST_INNER  (cost=0.12..2018197.47 rows=21160 width=0)
        Hash Cond: ("substring"(("outer".carrier)::text, 1, 3) = "substring"(("inner".to_carrier)::text, 1, 3))
        ->  XN Seq Scan on toto_carriers  (cost=0.00..4231.94 rows=423194 width=10)
        ->  XN Hash  (cost=0.10..0.10 rows=10 width=10)
              ->  XN Seq Scan on toto_calls  (cost=0.00..0.10 rows=10 width=10)

DS_BCAST_INNER,所以广播,嘘...


推荐阅读