首页 > 解决方案 > 计算外部数据框中的行数

问题描述

考虑以下问题:我有两个数据框casesevents. 每个case可以有几个events(1:n)。events$caseId(外键)是指case$id(主键):

cases       events
------      ------
id          id
date        caseId
var1        date
var2        var1
...         var2
            ...

现在,我想创建一个新列来cases给出每个案例的事件数(即 events$caseId 等于 case$id)

# Sample data            
cases<-data.frame(id=1:5, date=c("2017-01-02","2016-02-03","2015-02-12","2016-01-03","2016-08-09"), var1=sample(c("A", "B"), 5, replace=T))
events<-data.frame(id=1:10, date=c("2017-01-01","2016-12-12","2016-07-04","2017-04-03","2015-02-13","2015-01-01","2013-05-07","2015-12-25","2016-05-04","2016-10-11"), caseId=c(1,1,1,1,3,3,3,4,5,5))

# Calculate the number of events for every caseId
library(tidyverse)
events %>%  
  count(caseId) %>%
  right_join(cases, by = c("caseId" = "id"))

我遇到的问题是我只想计算在 -dataframe 中指定的日期之后发生的那些事件case(即 events$date > cases$date)

我将不胜感激任何帮助。

标签: rjointidyr

解决方案


您可以使用 data.table 包进行非 equi 连接(date首先需要将 -columns 转换为日期格式,请参见下面的Data -heading):

library(data.table)
setDT(cases)
setDT(events)

cases[events, on = .(id = caseId, date < date), events := .N, by = .EACHI][]

结果:

   id       date var1 events
1:  1 2017-01-02    B      1
2:  2 2016-02-03    A     NA
3:  3 2015-02-12    A      1
4:  4 2016-01-03    B     NA
5:  5 2016-08-09    B      1

这个的一个变种:

cases[, events := events[cases, on = .(caseId = id, date > date), .N, by = .EACHI]$N][]

结果:

   id       date var1 events
1:  1 2017-01-02    A      1
2:  2 2016-02-03    A      0
3:  3 2015-02-12    A      1
4:  4 2016-01-03    A      0
5:  5 2016-08-09    A      1

数据:

cases <- data.frame(id=1:5, date=c("2017-01-02","2016-02-03","2015-02-12","2016-01-03","2016-08-09"),
                    var1=sample(c("A", "B"), 5, replace=T))
events <- data.frame(id=1:10,
                     date=c("2017-01-01","2016-12-12","2016-07-04","2017-04-03","2015-02-13","2015-01-01","2013-05-07","2015-12-25","2016-05-04","2016-10-11"),
                     caseId=c(1,1,1,1,3,3,3,4,5,5))
cases$date <- as.Date(cases$date)
events$date <- as.Date(events$date)

推荐阅读