首页 > 解决方案 > Postgres:对于每一行,在条件下评估所有连续的行

问题描述

我有这张桌子:

id | datetime            | row_number 

1    2018-04-09 06:27:00   1
1    2018-04-09 14:15:00   2 
1    2018-04-09 15:25:00   3
1    2018-04-09 15:35:00   4
1    2018-04-09 15:51:00   5
1    2018-04-09 17:05:00   6
1    2018-04-10 06:42:00   7 
1    2018-04-10 16:39:00   8 
1    2018-04-10 18:58:00   9
1    2018-04-10 19:41:00   10
1    2018-04-14 17:05:00   11
1    2018-04-14 17:48:00   12 
1    2018-04-14 18:57:00   13

我会为每一行计算时间 <= '01:30:00' 的连续行,并从不满足条件的第一行开始连续评估。

我试图更好地解释这个问题。使用 windows 函数 lag():

 SELECT id, datetime, 
        CASE WHEN datetime - lag (datetime,1)  OVER(PARTITION BY id ORDER BY datetime)   
        < '01:30:00' THEN 1 ELSE 0 END AS count
        FROM table

结果是:

id | datetime            | count 

1    2018-04-09 06:27:00   0
1    2018-04-09 14:15:00   0 
1    2018-04-09 15:25:00   1
1    2018-04-09 15:35:00   1
1    2018-04-09 15:51:00   1
1    2018-04-09 17:05:00   1
1    2018-04-10 06:42:00   0 
1    2018-04-10 16:39:00   0 
1    2018-04-10 18:58:00   0
1    2018-04-10 19:41:00   1
1    2018-04-14 17:05:00   0
1    2018-04-14 17:48:00   1 
1    2018-04-14 18:57:00   1

但这对我来说不行,因为我想排除 row_number 5,因为 row_number 5 和 row_number 2 之间的间隔 > '01:30:00'。并从第 5 行开始新的评估。第 13 行也一样。

正确的输出可能是:

id | datetime            | count 

1    2018-04-09 06:27:00   0
1    2018-04-09 14:15:00   0 
1    2018-04-09 15:25:00   1
1    2018-04-09 15:35:00   1
1    2018-04-09 15:51:00   0
1    2018-04-09 17:05:00   1
1    2018-04-10 06:42:00   0 
1    2018-04-10 16:39:00   0 
1    2018-04-10 18:58:00   0
1    2018-04-10 19:41:00   1
1    2018-04-14 17:05:00   0
1    2018-04-14 17:48:00   1 
1    2018-04-14 18:57:00   0

所以正确的计数是5。

标签: postgresqlwindow-functions

解决方案


我会为此使用递归查询:

WITH RECURSIVE tmp AS (
    SELECT
        id,
        datetime,
        row_number,
        0 AS counting,
        datetime AS last_start
    FROM mytable
    WHERE row_number = 1
    UNION ALL
    SELECT
        t1.id,
        t1.datetime,
        t1.row_number,
        CASE
            WHEN lateral_1.counting THEN 1
            ELSE 0
        END AS counting,
        CASE
            WHEN lateral_1.counting THEN tmp.last_start
            ELSE t1.datetime
        END AS last_start
    FROM
        mytable AS t1
    INNER JOIN
        tmp ON (t1.id = tmp.id AND t1.row_number - 1 = tmp.row_number),
    LATERAL (SELECT (t1.datetime - tmp.last_start) < '1h 30m'::interval AS counting) AS lateral_1
)
SELECT id, datetime, counting
FROM tmp
ORDER BY id, datetime;

推荐阅读