首页 > 解决方案 > R中带有日期和分类变量的cumsum

问题描述

我有这个数据集:

df <- data.frame(Date = c("12-01-2019","12-01-2019","12-02-2019","12-02-2019","12-02-2019","12-03-2019"),
                 Country = c("France","USA","France","USA","Colombia","USA")).

我想用 dplyr 应用 cumsum 并得到这个结果:

Date          Country cumsum
"12-01-2019" "France"   1
"12-01-2019" "USA"      1
"12-01-2019" "Colombia" 0
"12-02-2019" "France"   2
"12-02-2019" "USA"      2
"12-02-2019" "Colombia" 1
"12-03-2019" "France"   2
"12-03-2019" "USA"      3
"12-03-2019" "Colombia" 1

有什么建议吗?

非常感谢您的帮助。

问候!

标签: rdplyrcumsum

解决方案


我们可以为每个和组合count的行数,每个缺失的日期并将计数添加为 0。最后,对于每个,我们可以取。DateCountrycompleteCountryCountrycumsum

library(dplyr)

df %>%
  mutate(Date = lubridate::mdy(Date)) %>%
  count(Date, Country) %>%
  tidyr::complete(Country, Date = seq(min(Date), max(Date), by = 'day'), 
                  fill = list(n = 0)) %>%
  group_by(Country) %>%
  mutate(n  = cumsum(n))


#  Country  Date           n
#  <chr>    <date>     <dbl>
#1 Colombia 2019-12-01     0
#2 Colombia 2019-12-02     1
#3 Colombia 2019-12-03     1
#4 France   2019-12-01     1
#5 France   2019-12-02     2
#6 France   2019-12-03     2
#7 USA      2019-12-01     1
#8 USA      2019-12-02     2
#9 USA      2019-12-03     3

推荐阅读