首页 > 解决方案 > 计算复杂 table_1 中的持续时间

问题描述

我有一个数据框

df <- data.frame("name" = c("jack", "william", "david", "john"),
                 "01-Jan-19" = c(NA,"A", NA,"A"),
                 "01-Feb-19" = c("A","A",NA,"A"),
                 "01-Mar-19" = c("S","A","A","A"),
                 "01-Apr-19" = c("A","A","A","S"),
                 "01-May-19" = c(NA,"A","A","A"),
                 "01-Jun-19" = c("A","S","A","S"),
                 "01-Jul-19" = c("A","S","A","S"),
                 "01-Aug-19" = c(NA,"S","A","A"),
                 "01-Sep-19" = c(NA,"S","A","S"),
                 "01-Oct-19" = c("S","S","A","S"),
                 "01-Nov-19" = c("S","S",NA,"S"),
                 "01-Dec-19" = c("S","S","S",NA),
                 "01-Jan-20" = c("S","M","A","M"),
                 "01-Feb-20" = c("M","M","M","M"))

为了计算每个人从第一个 A 到最后一个 A 之间的持续时间,我可以通过以下方式实现

duration <- df %>%
  tidyr::pivot_longer(cols = -name, names_to = 'person', values_drop_na = TRUE) %>%
  dplyr::mutate(person = dmy(sub('X', '', person))) %>%
  group_by(name) %>%
  dplyr::summarise(avg_duration = person[max(which(value == 'A'))] - person[min(which(value == 'A'))])

请问如何为以下目的修改代码?两个A之间的周期,我怎样才能减去具有其他值的周期(任何不是A,例如S,NA)?太感谢了。

标签: rdateduration

解决方案


在这种情况下,我很难理解句号的含义。从您的示例看来,您在一段时间内需要两个值。这也是要减去的时期的情况吗?例如,如果一个人有系列:A、S、A,那么他们与 A 有 0、1 或 2 个句点吗?那么A,S,A,A,S,A呢?

相反,如果每个日期代表一个期间,即在下一个日期之前有效,则每个人值组合的总持续时间可以计算如下:

duration <- df %>%
  tidyr::pivot_longer(cols = -name, names_to = 'date') %>%
  dplyr::mutate(date = lubridate::dmy(sub('X', '', date))) %>%
  group_by(name) %>%
  dplyr::arrange(name, date) %>% 
  dplyr::mutate(duration = c(diff(date), 0)) %>% 
  dplyr::group_by(name, value) %>% 
  dplyr::summarise(summed_duration = sum(duration))
# A tibble: 15 x 3
# Groups:   name [4]
   name    value summed_duration
   <chr>   <chr> <drtn>         
 1 david   A     276 days       
 2 david   M       0 days       
 3 david   S      31 days       
 4 david   NA     89 days       
 5 jack    A     119 days       
 6 jack    M       0 days       
 7 jack    S     154 days       
 8 jack    NA    123 days       
 9 john    A     152 days       
10 john    M      31 days       
11 john    S     182 days       
12 john    NA     31 days       
13 william A     151 days       
14 william M      31 days       
15 william S     214 days 

根据评论编辑

df %>%
  tidyr::pivot_longer(cols = -name, names_to = 'date') %>%
  dplyr::mutate(date = lubridate::dmy(sub('X', '', date))) %>%
  dplyr::group_by(name) %>%
  dplyr::arrange(name, date) %>% 
  dplyr::mutate(duration = c(diff(date), 0)) %>% 
  dplyr::group_by(name, value) %>% 
  dplyr::filter(dplyr::row_number() < dplyr::last(dplyr::row_number()) | value != 'A') %>% 
  dplyr::summarise(summed_duration = sum(duration)) %>%
  dplyr::filter(value == 'A')

# A tibble: 4 x 3
# Groups:   name [4]
  name    value summed_duration
  <chr>   <chr> <drtn>         
1 david   A     245 days       
2 jack    A      88 days       
3 john    A     121 days       
4 william A     120 days   

推荐阅读