首页 > 解决方案 > R:循环遍历行直到满足条件,然后在下一行重新开始

问题描述

我有一个带有时间戳的每个客户订单的表格。我想知道x在一个订单之后的时间范围内发生了哪些订单,一旦时间范围结束,就x从下一个订单的时间范围重新开始。新列应始终说明哪一个是第一个订单。

最好看下面的例子。

我已经尝试了一些for loopsnext但根本无法使其工作。

数据如下所示:

x <- data.frame("Customer" =c(123,123,123,123,123,123,123,567), "Order_nr" = c(1,2,3,4,5,6,7,1), "Order_datetime" = c('2018-11-24 00:00:25','2018-11-24 15:58:23','2018-11-24 19:10:29','2018-11-24 21:29:04','2018-11-24 22:03:59','2018-11-24 22:26:59','2018-11-24 22:36:13','2018-11-24 12:00:55'))
x
| Customer | Order_nr | Order_datetime|
| ------------- |:-------------:| -----:|
| 123      | 1 | 2018-11-24 00:00:25 |
| 123      | 2 | 2018-11-24 15:58:23 |
| 123      | 3 | 2018-11-24 19:10:29 |
| 123      | 4 | 2018-11-24 21:29:04 |
| 123      | 5 | 2018-11-24 22:03:59 |
| 123      | 6 | 2018-11-24 22:26:59 |
| 123      | 7 | 2018-11-24 22:36:1 |
| 567      | 1 | 2018-11-24 12:00:55 |

如果我想知道 1h 时间范围内的订单,我想在 column 中有结果1h bundle first order,如果是 3h,它应该是 column 的结果3h bundle first order

| Customer | Order_nr | Order_datetime| 3h bundle first order| 3h bundle first order|
| ------------- |:-------------:| -----:|-----:|
| 123      | 1A | 2018-11-24 00:00:25 |1A |1A|
| 123      | 2A | 2018-11-24 15:58:23 |2A |2A|
| 123      | 3A | 2018-11-24 19:10:29 |3A |3A|
| 123      | 4A | 2018-11-24 21:29:04 |4A |3A|
| 123      | 5A | 2018-11-24 22:03:59 |4A |3A|
| 123      | 6A | 2018-11-24 22:26:59 |4A |4A|
| 123      | 7A | 2018-11-24 22:36:1  |5A |4A|
| 567      | 1B | 2018-11-24 12:00:55 |1B |1B|

所以我需要知道订单 4A、5A 和 6A 发生在从订单 4A 开始的 1 小时内,例如1h bundle first order.

标签: rloops

解决方案


so <- data.frame("Customer" =c(123,123,123,123,123,123,123,567), 
                 "Order_nr" = c(1,2,3,4,5,6,7,1), 
                 "Order_datetime" = c('2018-11-24 00:00:25','2018-11-24 15:58:23',
                                      '2018-11-24 19:10:29','2018-11-24 21:29:04',
                                      '2018-11-24 22:03:59','2018-11-24 22:26:59',
                                      '2018-11-24 22:36:13','2018-11-24 12:00:55'))


learn <- function(date_time, df, hr.within, i){

  subject <- abs(difftime(date_time, df$Order_datetime, units="hours"))

  ifelse(i ==1, 
         thatrow <- which((subject <= hr.within) == TRUE), 
         thatrow <- intersect( which((subject <= hr.within) == TRUE), 
                               which((subject >= hr.within-1) == TRUE)))

  if(identical(thatrow, integer(0))) return()

  else{
    R2 <- df[thatrow, c("Customer", "Order_nr", "Order_datetime")]
    R2$x <- NA
    R2[,"x"] <- paste0(hr.within, "A")
    colnames(R2)[4] <- paste0(hr.within,"A bundle first order")
    return(R2)
  }
}


learn.wrapper <- function(date_time, df, hr.within=seq(1,100,1)){
  learn.out <- list()
  for(i in 1:length(hr.within)){
    learn.out[[i]] <- learn(date_time,so, hr.within[i], i)
  }
  return(rbindlist(learn.out, fill=TRUE))
}

learnery <- learn.wrapper('2018-11-24 19:00:00', df=so) #first argument is the time you want to ref. with
learnery

这假设所有操作都在 100 小时内发生,您可以通过重置到适当的持续时间窗口hr.within=seq(1,100,1),然后重新编译。然后,您可以在查看其输出时自行对结果进行行合并。


推荐阅读