首页 > 解决方案 > 在重写此查询时需要指导

问题描述

我们有这个查询,我们运行它来生成日历周数据,这个查询两次命中同一个视图。由于缺少连接 ON 子句,可能会创建笛卡尔积。

无论如何要以最佳方式重新编写此查询。

SELECT cal_date,
       regexp_replace(cal_date, '-', '') AS PC_cal_date,
       year_num*100+week_num AS year_week_num,
       CASE
           WHEN year_num*100+pd_num IN (min_year_pd_num, max_year_pd_num) THEN 'A'
           ELSE 'B'
       END AS yr_pd_ind,
       year_num*100+pd_num AS yr_pd_num,
       dense_rank() OVER (ORDER BY year_num*100+week_num DESC) AS wk_index,
                         dense_rank() OVER (ORDER BY year_num*100+pd_num DESC) AS pd_index
FROM mstr_v.local_cal_date t1,

  (SELECT max(year_num*100+pd_num) max_year_pd_num,
          min(year_num*100+pd_num) min_year_pd_num
   FROM mstr_v.local_cal_date
   WHERE cal_date IN (date(date_sub(CURRENT_DATE, cast(date_format(CURRENT_DATE, 'u') AS int)+105*7+1)),
                      date(date_sub(CURRENT_DATE, cast(date_format(CURRENT_DATE, 'u') AS int)))) ) t2
WHERE cal_date BETWEEN date(date_sub(CURRENT_DATE, cast(date_format(CURRENT_DATE, 'u') AS int)+105*7)) 
AND date(date_sub(CURRENT_DATE, cast(date_format(CURRENT_DATE, 'u') AS int)+1))

标签: sqlperformancehivequery-optimizationhiveql

解决方案


如果将当前位于 t2 内部的 where 子句移动到 case 语句中并使用 over() 计算 min 和 max,则无需第二次表扫描(在同一子查询中)即可计算在 t2 子查询中计算的列:

SELECT cal_date,
       regexp_replace(cal_date, '-', '') AS PC_cal_date,
       year_num*100+week_num AS year_week_num,
       CASE
           WHEN year_num*100+pd_num IN (min_year_pd_num, max_year_pd_num) THEN 'A'
           ELSE 'B'
       END AS yr_pd_ind,
       year_num*100+pd_num AS yr_pd_num,
       dense_rank() OVER (ORDER BY year_num*100+week_num DESC) AS wk_index,
                         dense_rank() OVER (ORDER BY year_num*100+pd_num DESC) AS pd_index
FROM (select t1.*,            
             max(case when cal_date IN (date(date_sub(CURRENT_DATE, cast(date_format(CURRENT_DATE, 'u') AS int)+105*7+1)),
                                        date(date_sub(CURRENT_DATE, cast(date_format(CURRENT_DATE, 'u') AS int))))        
                      then  year_num*100+pd_num end) over() as max_year_pd_num,
             min(case when cal_date IN (date(date_sub(CURRENT_DATE, cast(date_format(CURRENT_DATE, 'u') AS int)+105*7+1)),
                                        date(date_sub(CURRENT_DATE, cast(date_format(CURRENT_DATE, 'u') AS int)))) 
                      then year_num*100+pd_num end) over() as min_year_pd_num
      from mstr_v.local_cal_date t1
)t1
WHERE cal_date BETWEEN date(date_sub(CURRENT_DATE, cast(date_format(CURRENT_DATE, 'u') AS int)+105*7)) 
AND date(date_sub(CURRENT_DATE, cast(date_format(CURRENT_DATE, 'u') AS int)+1))

推荐阅读