首页 > 解决方案 > 删除一系列时间戳,其中某个条件为 True

问题描述

背景:

我有一个数据集 df,并且想记录“连接”日期时间并删除一个操作,其中:

如果“连接”操作之后紧跟一个操作,其中下一个操作正在重复,或者 <= 到 60 秒,则继续迭代直到达到“结束”值(不显示此条件行为)和记录这次。

输出模式应始终遵循“连接”和“结束”

We start with:

Connect            4/6/2020 1:11:41 PM

Then look to the next line:

Ended              4/6/2020 1:14:20 PM

Now look to the line that follows:

Attempt            4/6/2020 1:15:20 PM


These two timestamps are less than or equal to 60 seconds, so we keep going    
until we come across an Ended value where these conditions do not apply. 
So the Ended value of 

Ended              4/6/2020 2:05:18 PM    gets recorded.




Action             Time

Connect            4/6/2020 1:11:41 PM
Ended              4/6/2020 1:14:20 PM
Attempt            4/6/2020 1:15:20 PM
Connect            4/6/2020 1:15:21 PM
Ended              4/6/2020 2:05:18 PM
Connect            3/31/2020 11:00:08 AM
Ended              3/31/2020 11:14:54 AM
Ended              3/31/2020 4:17:43 PM

本质上,我想删除这部分数据集:

Ended              4/6/2020 1:14:20 PM
Attempt            4/6/2020 1:15:20 PM
Connect            4/6/2020 1:15:21 PM
Ended              3/31/2020 4:17:43 PM

期望的输出:

Action              Time

Connect             4/6/2020 1:11:41 PM        
Ended               4/6/2020 2:05:18 PM
Connect             3/31/2020 11:00:08 AM
Ended               3/31/2020 11:14:54 AM

输出模式应始终遵循“连接”和“结束”

输入:

structure(list(Action = structure(c(2L, 3L, 1L, 2L, 3L, 2L, 3L, 
3L), .Label = c("Attempt", "Connect", "Ended"), class =     "factor"), 
 Time = structure(c(4L, 5L, 6L, 7L, 8L, 1L, 2L, 3L), .Label =      c("3/31/2020 11:00:08 AM", 
 "3/31/2020 11:14:54 AM", "3/31/2020 4:17:43 PM", "4/6/2020      1:11:41 PM", 
  "4/6/2020 1:14:20 PM", "4/6/2020 1:15:20 PM", "4/6/2020  1:15:21   PM", 
 "4/6/2020 2:05:18 PM"), class = "factor")), class =     "data.frame", row.names = c(NA, 
-8L))

这是我尝试过的:

我在想我应该使用一个循环,但不完全确定如何构造它。任何帮助表示赞赏。

  library(lubridate)
  if (value <= 60) {
   print("") 
   } else {
   Expr2
   }

标签: rloopsdplyrlubridate

解决方案


推荐阅读