首页 > 解决方案 > 使用连接中的子查询简化聚合查询

问题描述

我们有一个表格,其中保存了给定(期间,志愿者)对的总在场时间,另一个记录了动作:我们知道采取了哪个动作。但我们不跟踪在给定动作上花费的时间。这不是一个很好的数据结构,但我无法更改它。

CREATE TABLE volunteer_presence (id integer, volunteer_id integer, minutes integer);
CREATE TABLE logged_actions (id  integer, presence_id integer, section varchar);

我们想要做的是通过假设时间在给定存在的所有动作中平均分配来估计在给定动作上花费的时间。这不是很好,但我们很高兴至少拥有它。例如,假设我们有以下数据:

INSERT INTO volunteer_presence (id, volunteer_id, minutes) values (1, 333, 400);
INSERT INTO volunteer_presence (id, volunteer_id, minutes) values (2, 444, 90);
INSERT INTO volunteer_presence (id, volunteer_id, minutes) values (3, 555, 80);

INSERT INTO logged_actions (id, presence_id, section) values (10, 1, 'Cats');
INSERT INTO logged_actions (id, presence_id, section) values (11, 1, 'Dogs');
INSERT INTO logged_actions (id, presence_id, section) values (13, 1, 'Cats');
INSERT INTO logged_actions (id, presence_id, section) values (14, 1, 'Cats');
INSERT INTO logged_actions (id, presence_id, section) values (15, 1, 'Front');
INSERT INTO logged_actions (id, presence_id, section) values (16, 2, 'Dogs');

我们期望得到以下结果:

部分 存在_id 估计_分钟
1 240.0
小狗 1 80.0
正面 1 80.0
小狗 2 90.0

例如(5 个动作中的 3 个)*400 是 240。

我想出的解决方案对连接子句中的两个子查询感觉不对。感觉它不会很好地扩展。

SELECT  action_count_table.section,
        action_count_table.presence_id,
        ((action_count_table.total_action_count / presence_total_actions.action_count) * time_table.minutes) AS estimated_minutes
FROM (
  SELECT event.section AS section,
         volunteer_presence.id AS presence_id,
         SUM(event.action_count) AS total_action_count

  FROM (
    SELECT logged_actions.presence_id,
           logged_actions.section,
           COUNT(logged_actions.id) AS action_count
    FROM logged_actions
    GROUP BY logged_actions.presence_id, logged_actions.section
    ORDER BY logged_actions.presence_id, logged_actions.section
    ) AS event
  INNER JOIN volunteer_presence
  ON volunteer_presence.id = event.presence_id
  GROUP BY event.section, volunteer_presence.id
  ) AS action_count_table
INNER JOIN (
  SELECT DISTINCT logged_actions.presence_id AS presence_id,
                  volunteer_presence.minutes AS minutes
  FROM logged_actions
  INNER JOIN volunteer_presence
  ON volunteer_presence.id = logged_actions.presence_id
  ) AS time_table
ON action_count_table.presence_id = time_table.presence_id
INNER JOIN (
     SELECT logged_actions.presence_id,
           COUNT(logged_actions.id) AS action_count
    FROM logged_actions
    GROUP BY logged_actions.presence_id
    ORDER BY logged_actions.presence_id
) AS presence_total_actions
ON presence_total_actions.presence_id = time_table.presence_id

有没有更好的解决方案,可以更好地扩展或至少更简单?

标签: postgresqljoinsubquery

解决方案


分步演示:db<>fiddle

SELECT
    la.presence_id,
    la.section,
    vp.minutes * COUNT(*) / total                            -- 4
FROM (
    SELECT 
        *,
        COUNT(*) OVER (PARTITION BY presence_id) as total    -- 1
    FROM logged_actions
) la
JOIN volunteer_presence vp ON vp.id = la.presence_id         -- 2
GROUP BY la.presence_id, la.section, la.total, vp.minutes    -- 3
ORDER BY la.presence_id, la.section
  1. COUNT()窗口函数presence_id将每条记录的总计数
  2. 加入volunteer_presence
  3. 分组
  4. 计算每个sectionper的计数presence_id,除以之前计算的totalper 计数,presence_id然后乘以minutes

推荐阅读