首页 > 解决方案 > 需要帮助来优化这个 Sql 查询

问题描述

我需要一些帮助来优化这个 SQL 查询。这工作得很好。我只是想减少这个查询的运行时间

select distinct 
o.usrp_order_number,t.* 
from ms_bvoip_order_extension oe
 inner join ms_order o on oe.ms_order_id = o.ms_order_id
 inner join ms_sub_order so on so.ms_order_id = o.ms_order_id
 inner join ms_job j on j.entity_id = so.ms_sub_order_id
  left join mstask t ON t.wf_job_id = j.wf_job_id
  where
  o.order_type = 900
  and o.entered_date between date_sub(current_date(),53) and
 date_sub(current_date(),3)
  and j.entity_type = 5 and t.name RLIKE 'Error|Correct|Create AOTS Ticket' and t.wf_job_id is not null
  order by
  o.usrp_order_number

标签: performancehivequery-optimizationhiveql

解决方案


在 Hive 中连接之后执行 WHERE 条件(尽管 CBO 和 PPD 可能会改变这种行为),更好地研究两个查询的 EXPLAIN 输出。您可以像这样移动条件:o.order_type = 900到 join ON 子句以减少连接时的行数。在 Hive 中的 join ON 子句中只允许涉及两个表列的非 equi 条件。表 t 也是左连接的,但where: 中的条件t.name RLIKE 'Error|Correct|Create AOTS Ticket' and t.wf_job_id is null and t.ORIGINAL_START_DATE is not null将左连接转换为内连接。检查你需要 INNER 还是 LEFT JOIN

select distinct 
o.usrp_order_number,t.* 
from ms_bvoip_order_extension oe
 inner join ms_order o 
    on oe.ms_order_id = o.ms_order_id
       and o.order_type = 900
       and and o.entered_date between date_sub(current_date(),53) and date_sub(current_date(),3)                 
 inner join ms_sub_order so on so.ms_order_id = o.ms_order_id
 inner join ms_job j on j.entity_id = so.ms_sub_order_id 
                    and j.entity_type = 5
 left join mstask t on t.wf_job_id = j.wf_job_id 
                    and t.name RLIKE 'Error|Correct|Create AOTS Ticket' 
                    and t.wf_job_id is null
                    and t.ORIGINAL_START_DATE is not null 
order by o.usrp_order_number

另请阅读有关配置设置的答案:https ://stackoverflow.com/a/48487306/2700344


推荐阅读