首页 > 解决方案 > Postgres 窗口函数和时间戳之间的提取/平均持续时间

问题描述

多年来一直在阅读 SO,但这是我的第一篇文章。希望有人能帮我解决这个问题。

我是窗口函数的新手,但据我了解,这似乎是我正在寻找的。我有 3 个表用户、任务和 task_users。可以将一个或多个用户分配给一项任务(通过 task_users)。我希望看到的是一个显示以下内容的表格:

用户 ID
用户全名
已向该用户发布了多少任务(发生次数) 向该用户
发布的所有任务的平均持续时间(average_duration)

我用来提取单个任务持续时间的当前方法是:

EXTRACT(EPOCH from closed_at) - EXTRACT(EPOCH from started_at)/3600 AS 持续时间

以下是每个表中感兴趣的列:

用户
id
last_name
first_name

任务
id started_at
(timestamp w/o tz)
closed_at (timestamp w/o tz)

task_users
task_id(参考tasks.id)
user_id(参考users.id)

使用下面的 sql,我可以生成一个表格,显示每个用户、他们的 ID 以及发送给该用户的任务数:

SELECT 
    users.id AS u_id,
    concat(users.last_name, ', ', users.first_name) AS u_name, 
    COUNT(*) AS occurrences
FROM tasks
INNER JOIN task_users ON task_users.task_id = tasks.id
INNER JOIN users ON users.id = task_users.user_id
WHERE tasks.closed_at IS NOT NULL 
GROUP BY u_id
ORDER BY occurrences DESC

此查询显示:

----------------------------------
id    u_name           occurrences
----------------------------------
1  |  Mike Smith     | 10
2  |  Dave Johnson   | 5
3  |  George Wilson  | 3
etc...

我想要生成的是与上面相同的表,但是发送给每个用户的所有任务的平均持续时间(完成每个任务所需的小时数)。类似于以下内容:

------------------------------------------------------
id    u_name           occurrences    average_duration
------------------------------------------------------
1  |  Mike Smith     | 10           | 32.7
2  |  Dave Johnson   | 5            | 15.2
3  |  George Wilson  | 3            | 10.0
etc...

我尝试了以下子查询和窗口函数,但它将用户分成多行(每个用户显示的行数等于出现次数)。

SELECT 
    users.id AS u_id,
    concat(users.last_name, ', ', users.first_name) AS u_name, 
    COUNT(*) AS occurrences,
    AVG(tsk.duration) OVER(PARTITION BY users.id) AS average_duration
FROM 
    (SELECT id, (EXTRACT(EPOCH from closed_at) - EXTRACT(EPOCH from started_at)/3600) AS duration FROM tasks) tsk
INNER JOIN task_users ON tsk.id = task_users.task_id
INNER JOIN users ON users.id = task_users.user_id

我是窗口函数的新手,而不是 SQL 专家,但在我看来,窗口函数是最好的解决方案?

如果有人能指出我正确的方向或提出建议,我将不胜感激。

谢谢!

标签: sqlpostgresqlwindow-functions

解决方案


窗口函数将为每一行提供一个值。在您的场景中,用户有多个任务,因此,连接将导致每个用户多行。

您可以尝试以下方法来修改按用户对数据进行分组的原始方法:

SELECT 
    users.id AS u_id,
    concat(users.last_name, ', ', users.first_name) AS u_name, 
    COUNT(*) AS occurrences,
    SUM(
        EXTRACT(EPOCH from closed_at) - EXTRACT(EPOCH from started_at)/3600
    ) / COUNT(*) as average_duration
FROM tasks
INNER JOIN task_users ON task_users.task_id = tasks.id
INNER JOIN users ON users.id = task_users.user_id
WHERE tasks.closed_at IS NOT NULL 
GROUP BY u_id
ORDER BY occurrences DESC

或者

SELECT 
    users.id AS u_id,
    concat(users.last_name, ', ', users.first_name) AS u_name, 
    COUNT(*) AS occurrences,
    AVG(
        EXTRACT(EPOCH from closed_at) - EXTRACT(EPOCH from started_at)/3600
    ) as average_duration
FROM tasks
INNER JOIN task_users ON task_users.task_id = tasks.id
INNER JOIN users ON users.id = task_users.user_id
WHERE tasks.closed_at IS NOT NULL 
GROUP BY u_id
ORDER BY occurrences DESC

让我知道这是否适合您。


推荐阅读