postgresql - 在以后的查询中使用 CTE 的最小/最大值,而不是在 Postgres 中使用子查询
问题描述
在查询的后面部分,我有一个关于从 CTE 中提取结果的补救性问题。对于示例代码,以下是相关的精简表:
CREATE TABLE print_job (
created_dts timestamp not null default now(),
status text not null
);
CREATE TABLE calendar_day (
date_actual date not null
);
在当前设置中,数据中的日期存在间隔print_job
,我们希望得到无间隔的结果。例如,表中从第一个日期到最后一个日期有 87 天,其中只有 77 天有数据。我们已经有一个 calendar_day 维度表可以加入以获取 87 天范围的 87 行。使用子查询或 CTE很容易找出数据中的日期min
和max
日期,但我不知道如何使用 CTE 中的这些值。我在下面有一个完整的查询,但这里是带有评论的相关片段:
-- Get the date range from the data.
date_range AS (
select min(created_dts::date) AS start_date,
max(created_dts::date) AS end_date
from print_job),
-- This CTE does not work because it doesn't know what date_range is.
complete_date_series_using_cte AS (
select actual_date
from calendar_day
where actual_date >= date_range.start_date
and actual_date <= date_range.end_date
),
-- Subqueries are fine, because the FROM is specified in the subquery condition directly.
complete_date_series_using_subquery AS (
select date_actual
from calendar_day
where date_actual >= (select min(created_dts::date) from print_job)
and date_actual <= (select max(created_dts::date) from print_job)
)
我经常遇到这个,最后想我会问。我已经四处寻找答案,但我不清楚如何很好地总结它。虽然在这种情况下子查询没有任何问题,但在其他情况下,CTE 更好/更具可读性。
如果有帮助,我在下面列出了完整的查询。
-- Get some counts and give them names.
WITH
daily_status AS (
select created_dts::date as created_date,
count(*) AS daily_total,
count(*) FILTER (where status = 'Error') AS status_error,
count(*) FILTER (where status = 'Processing') AS status_processing,
count(*) FILTER (where status = 'Aborted') AS status_aborted,
count(*) FILTER (where status = 'Done') AS status_done
from print_job
group by created_dts::date
),
-- Get the date range from the data.
date_range AS (
select min(created_dts::date) AS start_date,
max(created_dts::date) AS end_date
from print_job),
-- There are gaps in the data, and we want a row for dates with no results.
-- Could use generate_series on a timestamp & convert that to dates. But,
-- in our case, we've already got dimension tables for days. All that's needed
-- here is the actual date.
-- This CTE does not work because it doesn't know what date_range is.
-- complete_date_series_using_cte AS (
-- select actual_date
--
-- from calendar_day
--
-- where actual_date >= date_range.start_date
-- and actual_date <= date_range.end_date
-- ),
complete_date_series_using_subquery AS (
select date_actual
from calendar_day
where date_actual >= (select min(created_dts::date) from print_job)
and date_actual <= (select max(created_dts::date) from print_job)
)
-- The final query joins the complete date series with whatever data is in the print_job table daily summaries.
select date_actual,
coalesce(daily_total,0) AS total,
coalesce(status_error,0) AS errors,
coalesce(status_processing,0) AS processing,
coalesce(status_aborted,0) AS aborted,
coalesce(status_done,0) AS done
from complete_date_series_using_subquery
left join daily_status
on daily_status.created_date =
complete_date_series_using_subquery.date_actual
order by date_actual
解决方案
我说这是一个补救问题....我记得我以前在哪里看到过这个:
https://tapoueh.org/manual-post/2014/02/postgresql-histogram/
在我的示例中,我需要在表格列表中列出 CTE。回想起来,这很明显,我意识到我自然而然地不想这样做,因为我习惯性地避免这样做CROSS JOIN
。下面的片段显示了所需的细微更改:
WITH
date_range AS (
select min(created_dts)::date as start_date,
max(created_dts)::date as end_date
from print_job
),
complete_date_series AS (
select date_actual
from calendar_day, date_range
where date_actual >= date_range.start_date
and date_actual <= date_range.end_date
),
推荐阅读
- python-3.x - AT-Commands (Python) SMS 长度问题
- python - 如何在 matplotlib 的直方图中绘制数组
- python - 打印出用逗号分隔的值
- react-native - 反应原生 admob 集成时间得到这个错误
- python - 如何在 multiprocessing.dummy.Pool 中单独命名线程?
- java - 从 java 运行 ansible-playbook
- ios - 解包时发现 Nil 可选
- laravel - 在 twilio 中发送短信时处理错误
- javascript - 如何使这个禁用的输入充当启用的输入并在单击按钮时滚动?
- jupyter-notebook - 将 Jupyter 笔记本添加到 IBM Watson Studio 项目