首页 > 解决方案 > 长时间运行的查询 - 提高 Redshift 性能的建议

问题描述

SELECT
    A.load,
    A.sender,
    A.latlong,
    COUNT(distinct B.load) as load_count,
    COUNT(distinct B.sender) as sender_count
FROM TABLE_A A
JOIN TABLE_B B ON 
    A.sender <> B.sender AND
    (
        A.latlong = B.latlong 
        or
        ( 
            lower(A.address_line1) = lower(B.address_line1)
            and lower(A.city) = lower(B.city)
            and lower(A.state) = lower(B.state)
            and lower(A.country) = lower(B.country)
        )
    )
GROUP BY A.load, A.sender, A.latlong ;

我正在尝试运行上述示例的查询,该查询运行时间更长(大约 2 小时),这根本不是预期的。我正在尝试拆分查询并执行UNION但结果集不匹配。

您能否提供改进此查询性能的选项或在 AWS 中实现此目标的替代方法?

大约 150 万条记录

标签: sqlamazon-web-servicesamazon-redshift

解决方案


我建议删除 to lower 功能并将数据清理为小写

select
 A.load, A.sender, A.latlong,
 count(distinct B.load) as load_count,
 count(distinct B.sender) as sender_count
 from 
 TABLE_A A
 join 
 TABLE_B B
 on 
 A.sender <> B.sender and
 (
 A.latlong = B.latlong 
 or
 ( 
  A.address_line1 =  B.address_line1
  and A.city) =  B.city)
  and A.state) =  B.state)
  and A.country) =  B.country)
 ))
 group by 
 A.load, A.sender, A.latlong ;

推荐阅读