首页 > 解决方案 > 我正在尝试创建基于条件变量确定事件之间时间的功能

问题描述

我正在尝试排列我的数据集并在我的数据集中创建一个新列,以确定基于 2 个单独列的事件之间的顺序时间。

我有以下代码可以帮助我到达那里,但在排除故障时遇到了困难。以前有没有人遇到过这个问题,或者可以用我的代码识别出这个问题?

我正在尝试使用的内容可以在下面找到:

样本数据集可以在下面找到:

UNITNUMBER <- c(1,1,1,1,2,2,3,3,3,4,4,4,4,4)
ORDERID <- c(5555,5558,5565,5278,5283,3287,3004,4678,2345,2189,1784,5743,4623,4541)
BREAKDOWN <- c(0,1,0,1,1,1,1,0,0,0,0,1,1,0)
RO_OPENED <- as.Date(c('2016-11-18','2016-11-28','2016-9-15','2017-4-2','2016-12-22','2017-3-8','2016-4-25','2016-2-3','2017-6-7','2016-7-5','2016-4-9','2017-10-27','2017-4-20','2017-5-10'))

test = data.frame(UNITNUMBER,ORDERID,BREAKDOWN,RO_OPENED)

test <-  test %>% data.table(key = c("UNITNUMBER","RO_OPENED"))


test <-  test[, c("UNITNUMBER", "RO_OPENED",
                             "TDIFF", "UNIQUEGROUP") :=
                           list(UNITNUMBER, RO_OPENED,
                                seq(.N), .GRP),
                         by = list(ORDERID)][, numSeq := seq(min(RO_OPENED), max(RO_OPENED)),
                                             by = list(UNIQUEGROUP)][, runningTotal := ifelse(RO_OPENED == numSeq,
                                                                                        seq(.N), 1L), 
                                                               by = list(UNITNUMBER, UNIQUEGROUP)]

我收到的错误如下:

Error in seq.Date(min(RO_OPENED), max(RO_OPENED)) : 
  exactly two of 'to', 'by' and 'length.out' / 'along.with' must be specified

我希望结果将是 2 个新列,为我提供一个 UNIQUEGROUP 标识符以及每个 UNITNUMBER 和 ORDERID 的 BREAKDOWNS 之间的时间差,如下所示:

UNIT OrderID BD    Date      TDIFF
1    5565    0    9/15/2016    NA
1    5555    0    11/18/2016   NA
1    5558    1    11/28/2016   0
1    5278    1    4/2/2017     125
2    5283    1    12/22/2016   0
2    3287    1    3/8/2017     76
3    4678    0    2/3/2016     NA
3    3004    1    4/25/2016    0
3    2345    0    6/7/2017     NA
4    1784    0    4/9/2016     NA
4    2189    0    7/5/2016     NA
4    4623    1    4/20/2017    0
4    4541    0    5/10/2017    NA
4    5743    1    10/27/2017   190

标签: rdplyrdata.table

解决方案


这应该做你的工作

library(dplyr)
test %>% 
  arrange(UNITNUMBER, RO_OPENED) %>% 
  group_by(UNITNUMBER, BREAKDOWN) %>% 
  mutate(TDIFF = coalesce(RO_OPENED - lag(RO_OPENED), 0),
         TDIFF = ifelse(BREAKDOWN == 0, NA, TDIFF))

推荐阅读