首页 > 解决方案 > 如何在 R 中创建一个新的数据框,它结合了每个 ID 可用的第一个日期和最后一个日期?

问题描述

例如,假设我有以下数据框:

ID<-c("A", "A", "B", "B", "B", "C")
StartDate<-as.Date(c("2018-01-01", "2019-02-05", "2016-04-18", "2020-03-03", "2021-12-13", "2014-03-03"), "%Y-%m-%d")
TermDate<-as.Date(c("2018-02-01", NA, "2016-05-18", "2020-04-03", "2021-12-15", "2014-04-03"), "%Y-%m-%d")
df<-data.frame(ID=ID, StartDate=StartDate, TermDate=TermDate)

  ID  StartDate   TermDate
1  A 2018-01-01 2018-02-01
2  A 2019-02-05       <NA>
3  B 2016-04-18 2016-05-18
4  B 2020-03-03 2020-04-03
5  B 2021-12-13 2021-12-15
6  C 2014-03-03 2014-04-03

我最终想要得到的是以下内容:


  ID  StartDate   TermDate
1  A 2018-01-01       <NA>
2  B 2016-04-18 2021-12-15
3  C 2014-03-03 2014-04-03

标签: r

解决方案


有一些函数firstlastin可以在这里提供帮助dplyrdata.table

library(dplyr)

df %>%
  group_by(ID) %>%
  summarise(StartDate = first(StartDate), 
            TermDate = last(TermDate))

#  ID    StartDate  TermDate  
#* <chr> <date>     <date>    
#1 A     2018-01-01 NA        
#2 B     2016-04-18 2021-12-15
#3 C     2014-03-03 2014-04-03

data.table

library(data.table)
setDT(df)[, .(StartDate = first(StartDate), TermDate = last(TermDate)), ID]

推荐阅读