首页 > 解决方案 > R:如何在提取同一天的最大数量后从累积数据中计算2天持续时间的发生率?

问题描述

我有一个累积数据,例如;

df1 <- data.frame(code=c(1,1,1,1,1,2,2,2,2,3,3,3,3,3,3,4,4,4,4,5,5,5,5), 
                 date=c("2020-01-01", "2020-01-01","2020-01-02","2020-01-03","2020-01-04","2020-01-01","2020-01-02","2020-01-03",
                        "2020-01-04","2020-01-01","2020-01-01","2020-01-02","2020-01-02","2020-01-03","2020-01-04","2020-01-01",
                        "2020-01-02","2020-01-04","2020-01-03","2020-01-01","2020-01-02","2020-01-03","2020-01-04"),
                 cumulative=c(2,3,3,4,4,4,4,6,6,7,8,10,13,14,16,1,2,3,5,1,2,3,5))

从这里,我想提取每个代码和每个日期的最大累积数;

df2 <- data.frame(code=c(1,1,1,1,2,2,2,2,3,3,3,3,4,4,4,4,5,5,5,5), 
                  date=c("2020-01-01","2020-01-02","2020-01-03","2020-01-04","2020-01-01","2020-01-02","2020-01-03",
                         "2020-01-04","2020-01-01","2020-01-02","2020-01-03","2020-01-04","2020-01-01",
                         "2020-01-02","2020-01-03","2020-01-04","2020-01-01","2020-01-02","2020-01-03","2020-01-04"),
                  cumulative=c(3,3,4,4,4,4,6,6,8,13,14,16,1,2,3,5,1,2,3,5))

现在我有每天每个代码的累积数字。从这里我想计算 2 天持续时间的发生率。

df3 <- data.frame(code=c(1,2,3,4,5),
                  incidence1=c(1,2,6,2,2),incidence2=c(1,2,3,3,3))

Incidence1 表示 2020-01-01 和 2020-01-03 之间的差异, Incidence2 表示 2020-01-02 和 2020-01-04 之间的差异

我想知道的是1)如何提取同一天内的最大数量2)如何计算2天之间的差异

请教教我,谢谢。

标签: rdataframemaxextraction

解决方案


这是一种通过创建每个交替行的组并获取cumulative它们之间值的差异来实现此目的的方法。要获得与所示格式相同的预期输出,我们可以使用pivot_widerfrom tidyr

library(dplyr)
library(tidyr)

df2 %>%
  group_by(code) %>%
  group_by(gr = rep(seq(1, n()/2), 2), add = TRUE) %>%
  summarise(incidence = diff(cumulative)) %>%
  pivot_wider(names_from = gr, values_from = incidence, names_prefix = "incidence")

#  code incidence1 incidence2
#  <dbl>      <dbl>      <dbl>
#1     1          1          1
#2     2          2          2
#3     3          6          3
#4     4          2          3
#5     5          2          3

推荐阅读