首页 > 解决方案 > data.table 中的连续天数按主题重置

问题描述

我正在尝试让新主题重置 Consecutive_days 列。我相信这很简单。我是 data.table 的新手。这可能在 data.table 中还是我需要转换回 data.frame ?

旧数据表:

DT = data.table(
  Subject = rep(c("A", "B"), 4:3),
  Date = as.Date(
    sprintf("10-%02d-%02d", c(22:25, 25:27), rep(1:2, 4:3)),
    '%m-%d-%y'
  )
)
DT[]
#    Subject       Date
# 1:       A 2001-10-22
# 2:       A 2001-10-23
# 3:       A 2001-10-24
# 4:       A 2001-10-25
# 5:       B 2002-10-25
# 6:       B 2002-10-26
# 7:       B 2002-10-27

我尝试了什么:

DT[, Consecutive_days := c(0,diff(Date)), by =.(Subject)]

发生了什么:

#    Subject       Date Consecutive_days
# 1:       A 2001-10-22                0
# 2:       A 2001-10-23                1
# 3:       A 2001-10-24                1
# 4:       A 2001-10-25                1
# 5:       B 2002-10-25                0
# 6:       B 2002-10-26                1
# 7:       B 2002-10-27                1

我正在努力做到这一点;每次Subject更改时重置

#    Subject       Date Consecutive_days
# 1:       A 2001-10-22                0
# 2:       A 2001-10-23                1
# 3:       A 2001-10-24                2
# 4:       A 2001-10-25                3
# 5:       B 2002-10-25                0
# 6:       B 2002-10-26                1
# 7:       B 2002-10-27                2

标签: rdata.table

解决方案


为什么不简单地添加一个分组语句by = Subject来计算它 group_wise 和cumsum你已经计算的差异字段。

library(data.table)
DT = data.table(
  Subject = rep(c("A", "B"), 4:3),
  Date = as.Date(
    sprintf("10-%02d-%02d", c(22:25, 25:27), rep(1:2, 4:3)),
    '%m-%d-%y'
  )
)

DT[, cons_days := cumsum(c(0, diff(Date))), by = Subject]
DT
#>    Subject       Date cons_days
#> 1:       A 2001-10-22         0
#> 2:       A 2001-10-23         1
#> 3:       A 2001-10-24         2
#> 4:       A 2001-10-25         3
#> 5:       B 2002-10-25         0
#> 6:       B 2002-10-26         1
#> 7:       B 2002-10-27         2

reprex 包于 2021-05-18 创建 (v2.0.0 )


推荐阅读