首页 > 解决方案 > 在以后的查询中使用 CTE 的最小/最大值,而不是在 Postgres 中使用子查询

问题描述

在查询的后面部分,我有一个关于从 CTE 中提取结果的补救性问题。对于示例代码,以下是相关的精简表:

CREATE TABLE print_job (
   created_dts  timestamp  not null default now(),
   status       text       not null
);

CREATE TABLE calendar_day (
   date_actual date not null
);

在当前设置中,数据中的日期存在间隔print_job,我们希望得到无间隔的结果。例如,表中从第一个日期到最后一个日期有 87 天,其中只有 77 天有数据。我们已经有一个 calendar_day 维度表可以加入以获取 87 天范围的 87 行。使用子查询或 CTE很容易找出数据中的日期minmax日期,但我不知道如何使用 CTE 中的这些值。我在下面有一个完整的查询,但这里是带有评论的相关片段:

-- Get the date range from the data.
date_range AS (
    select min(created_dts::date) AS start_date,
           max(created_dts::date) AS end_date
           
   from print_job),
   
-- This CTE does not work because it doesn't know what date_range is.
complete_date_series_using_cte AS (
     select actual_date
     
       from calendar_day

   where actual_date >= date_range.start_date  
     and actual_date <= date_range.end_date
),

-- Subqueries are fine, because the FROM is specified in the subquery condition directly.
complete_date_series_using_subquery AS (

   select date_actual
     
     from calendar_day

    where date_actual >= (select min(created_dts::date) from print_job)
      and date_actual <= (select max(created_dts::date) from print_job)   
)

我经常遇到这个,最后想我会问。我已经四处寻找答案,但我不清楚如何很好地总结它。虽然在这种情况下子查询没有任何问题,但在其他情况下,CTE 更好/更具可读性。

如果有帮助,我在下面列出了完整的查询。

-- Get some counts and give them names.
WITH 
daily_status AS (
    select created_dts::date as created_date,
           count(*)                                        AS daily_total,
           count(*) FILTER (where status = 'Error')        AS status_error,
           count(*) FILTER (where status = 'Processing')   AS status_processing,
           count(*) FILTER (where status = 'Aborted')      AS status_aborted,
           count(*) FILTER (where status = 'Done')         AS status_done

     from print_job

     group by created_dts::date
),

-- Get the date range from the data.
date_range AS (
    select min(created_dts::date) AS start_date,
           max(created_dts::date) AS end_date
           
   from print_job),
   
-- There are gaps in the data, and we want a row for dates with no results.  
-- Could use generate_series on a timestamp & convert that to dates. But,
-- in our case, we've already got dimension tables for days. All that's needed
-- here is the actual date. 

-- This CTE does not work because it doesn't know what date_range is.
-- complete_date_series_using_cte AS (
--      select actual_date
--      
--        from calendar_day
-- 
--    where actual_date >= date_range.start_date  
--      and actual_date <= date_range.end_date
-- ),

complete_date_series_using_subquery AS (

   select date_actual
     
     from calendar_day

    where date_actual >= (select min(created_dts::date) from print_job)
      and date_actual <= (select max(created_dts::date) from print_job)
     
)

-- The final query joins the complete date series with whatever data is in the print_job table daily summaries.
select date_actual,
       coalesce(daily_total,0)        AS total, 
       coalesce(status_error,0)       AS errors,
       coalesce(status_processing,0)  AS processing,
       coalesce(status_aborted,0)     AS aborted,
       coalesce(status_done,0)        AS done
       
   from complete_date_series_using_subquery
left join daily_status
       on daily_status.created_date  = 
          complete_date_series_using_subquery.date_actual

    order by date_actual

标签: postgresqlsubquerycommon-table-expression

解决方案


我说这是一个补救问题....我记得我以前在哪里看到过这个:

https://tapoueh.org/manual-post/2014/02/postgresql-histogram/

在我的示例中,我需要在表格列表中列出 CTE。回想起来,这很明显,我意识到我自然而然地不想这样做,因为我习惯性地避免这样做CROSS JOIN。下面的片段显示了所需的细微更改:

WITH
date_range AS ( 

select min(created_dts)::date as start_date,
       max(created_dts)::date as end_date
       
  from print_job
),        

complete_date_series AS ( 

select date_actual 
  from calendar_day, date_range
 where date_actual >= date_range.start_date   
   and date_actual <= date_range.end_date
),

推荐阅读