首页 > 解决方案 > SQL 如何获取和使用以前的数据并在以后计算

问题描述

这是我拥有的数据:

+-----+-------------+----------+-----------+
| id  |    date     |  status  |  health   |
+-----+-------------+----------+-----------+
| 270 | 4/5/20 3:00 | STARTING | UNHEALTHY |
| 270 | 4/5/20 3:03 | ACTIVE   | UNHEALTHY |
| 270 | 4/5/20 3:05 | ACTIVE   | UNHEALTHY |
| 270 | 4/5/20 3:08 | ACTIVE   | HEALTHY   |
| 270 | 4/5/20 3:14 | ACTIVE   | UNHEALTHY |
+-----+-------------+----------+-----------+

我想计算由 STARTING 引起的停机时间(在这个例子中,从 4/5/20 3:03 ~ 4/5/20 3:08 开始)。

这是我希望的结果:

+-----+-------------+----------+-----------+-------------------------------------+
| id  |    date     |  status  |  health   | STARTING_CAUSED_DOWNTIME_IN_SECONDS |
+-----+-------------+----------+-----------+-------------------------------------+
| 270 | 4/5/20 3:00 | STARTING | UNHEALTHY |                                     |
| 270 | 4/5/20 3:03 | ACTIVE   | UNHEALTHY |                                     |
| 270 | 4/5/20 3:05 | ACTIVE   | UNHEALTHY |                                     |
| 270 | 4/5/20 3:08 | ACTIVE   | HEALTHY   |                                 480 |
| 270 | 4/5/20 3:14 | ACTIVE   | UNHEALTHY |                                     |
+-----+-------------+----------+-----------+-------------------------------------+

如果您看到这一点,我没有考虑最后一行 ACTIVE / UNHEALTHY (3:14),因为它不是 STARTING 导致停机(这意味着开始后不健康)

我试过的:

SELECT 
id,date,status,prev_status,health,prev_health,LAST_START_TIME,HEALTHY_AFTER_STARTING_TIME
FROM
    (
        SELECT 
        id,date,status,health,
        LAG(status) OVER(partition BY id ORDER BY DATE) AS prev_status,
        LAG(health) OVER (partition BY id ORDER BY DATE) AS prev_health,
        (case when status = 'ACTIVE' AND prev_status = 'STARTING' then date end) AS LAST_START_TIME,
        (case when status = 'ACTIVE' AND prev_status = 'ACTIVE' AND prev_health = 'UNAVAILABLE' AND health_state <> 'UNAVAILABLE' THEN date END) AS HEALTHY_AFTER_STARTING_TIME
        FROM  
        DB_TABLE_04
    ) t1 ORDER BY DATE

我试图通过使用此查询找到 2 次(当 STARTING 停机时间开始/当 STARTING 停机时间结束时)。

我发现了2个问题。

  1. 当我第二次找到停机时间结束时(HEALTHY_AFTER_STARTING_TIME),我无法弄清楚如何找到停机时间。

我认为它需要类似于:

(extract(epoch from (starting_downtime_end - most_recent_starting_downtime_start)))

但我该如何实现呢?

  1. 这是更大的问题。如果我采用这种方法,我最终会考虑所有活动/不健康的情况,即使它不是由开始引起的

例如,它将创建如下内容:

+-----+-------------+----------+-------------+-----------+-------------+-----------------+-----------------------------+
| id  |    date     |  status  | prev_status |  health   | prev_health | LAST_START_TIME | HEALTHY_AFTER_STARTING_TIME |
+-----+-------------+----------+-------------+-----------+-------------+-----------------+-----------------------------+
| 270 | 4/5/20 3:00 | STARTING |             | UNHEALTHY |             |                 |                             |
| 270 | 4/5/20 3:03 | ACTIVE   | STARTING    | UNHEALTHY | UNHEALTHY   | 4/5/20 3:00     |                             |
| 270 | 4/5/20 3:05 | ACTIVE   | ACTIVE      | UNHEALTHY | UNHEALTHY   |                 |                             |
| 270 | 4/5/20 3:08 | ACTIVE   | ACTIVE      | HEALTHY   | UNHEALTHY   |                 | 4/5/20 3:08                 |
| 270 | 4/5/20 3:14 | ACTIVE   | ACTIVE      | UNHEALTHY | HEALTHY     |                 |                             |
| 270 | 4/5/20 3:18 | ACTIVE   | ACTIVE      | HEALTHY   | UNHEALTHY   |                 | 4/5/20 3:18                 |
+-----+-------------+----------+-------------+-----------+-------------+-----------------+-----------------------------+

我不想考虑 4/5/20 3:18 的情况,但它会的。

如何从数据中获得我想要的结果?

我也可以在这个问题中使用窗口函数吗?

标签: sqldatabaseamazon-redshift

解决方案


嗯。. . 这似乎得到了你想要的日期:

select t.*,
       (case when health = 'HEALTHY'
             then max(case when status = 'STARTING' then date end) over
                      (partition by id
                       order by date
                       rows between unbounded preceding and current row
                      )
        end) as starting_date
from DB_TABLE_04 t;

以及以秒为单位的差异:

datediff(second,
         (case when health = 'HEALTHY'
               then max(case when status = 'STARTING' then date end) over
                        (partition by id
                         order by date
                         rows between unbounded preceding and current row
                        )
          end),
         date
        ) as seconds_diff

此外,窗口框架子句 ( rows between) 在 SQL Server 中是不必要的,但在 Redshift 中是必需的。


推荐阅读