首页 > 解决方案 > 使用多组日期对 R 数据框进行子集化

问题描述

我有以下数据集:

 ID               dates                  d1                  d2                  d3                 d4
 X1 2007-09-09 09:00:00 2007-09-10 09:00:00 2007-09-11 09:00:00                 <NA>               <NA>
 X1 2007-09-10 09:00:00 2007-09-10 09:00:00 2007-09-11 09:00:00                 <NA>               <NA>
 X1 2007-09-11 09:00:00 2007-09-10 09:00:00 2007-09-11 09:00:00                 <NA>               <NA>
 X1 2007-09-13 09:00:00 2007-09-10 09:00:00 2007-09-11 09:00:00                 <NA>               <NA> 
 X2 2007-10-09 09:00:00 2007-10-08 09:00:00 2007-10-10 09:00:00 2007-10-13 09:00:00 2007-10-16 09:00:00
 X2 2007-10-10 09:00:00 2007-10-08 09:00:00 2007-10-10 09:00:00 2007-10-13 09:00:00 2007-10-16 09:00:00
 X2 2007-10-11 09:00:00 2007-10-08 09:00:00 2007-10-10 09:00:00 2007-10-13 09:00:00 2007-10-16 09:00:00
 X2 2007-10-14 09:00:00 2007-10-08 09:00:00 2007-10-10 09:00:00 2007-10-13 09:00:00 2007-10-16 09:00:00
 X2 2007-10-15 09:00:00 2007-10-08 09:00:00 2007-10-10 09:00:00 2007-10-13 09:00:00 2007-10-16 09:00:00
 X2 2007-10-20 09:00:00 2007-10-08 09:00:00 2007-10-10 09:00:00 2007-10-13 09:00:00 2007-10-16 09:00:00

我的目标是将数据子集为两个数据集,例如其中一个具有 d1 和 d2 之间以及 d3 和 d4 之间的所有日期,另一个具有所有剩余日期。

结果如下:

data1(d1、d2、d3、d4 之间的日期):

ID               dates                  d1                  d2                  d3                d4
X1 2007-09-10 09:00:00 2007-09-10 09:00:00 2007-09-11 09:00:00                <NA>                <NA>
X1 2007-09-11 09:00:00 2007-09-10 09:00:00 2007-09-11 09:00:00                <NA>                <NA>
X2 2007-10-09 09:00:00 2007-10-08 09:00:00 2007-10-10 09:00:00 2007-10-13 09:00:00 2007-10-16 09:00:00
X2 2007-10-10 09:00:00 2007-10-08 09:00:00 2007-10-10 09:00:00 2007-10-13 09:00:00 2007-10-16 09:00:00
X2 2007-10-14 09:00:00 2007-10-08 09:00:00 2007-10-10 09:00:00 2007-10-13 09:00:00 2007-10-16 09:00:00
X2 2007-10-15 09:00:00 2007-10-08 09:00:00 2007-10-10 09:00:00 2007-10-13 09:00:00 2007-10-16 09:00:00

data2(剩余日期):

ID               dates                  d1                  d2                  d3                  d4
X1 2007-09-11 09:00:00 2007-09-10 09:00:00 2007-09-11 09:00:00                <NA>                <NA>
X1 2007-09-13 09:00:00 2007-09-10 09:00:00 2007-09-11 09:00:00                <NA>                <NA>
X2 2007-10-11 09:00:00 2007-10-08 09:00:00 2007-10-10 09:00:00 2007-10-13 09:00:00 2007-10-16 09:00:00
X2 2007-10-20 09:00:00 2007-10-08 09:00:00 2007-10-10 09:00:00 2007-10-13 09:00:00 2007-10-16 09:00:00

我有一个简单的方法可以做到这一点吗?这是我的第一个数据集的代码,因此您可以重现它:

ID<-rep(c("X1","X2"),times=c(4,6))
dates<-c("2007-09-09 09:00:00","2007-09-10 09:00:00","2007-09-11 09:00:00","2007-09-13 09:00:00","2007-10-09 09:00:00","2007-10-10 09:00:00","2007-10-11 09:00:00","2007-10-14 09:00:00", "2007-10-15 09:00:00","2007-10-20 09:00:00")
d1<-rep(c("2007-09-10 09:00:00","2007-10-08 09:00:00"),times=c(4,6))
d2<-rep(c("2007-09-11 09:00:00","2007-10-10 09:00:00"),times=c(4,6))
d3<-rep(c(NA,"2007-10-13 09:00:00"),times=c(4,6))
d4<-rep(c(NA,"2007-10-16 09:00:00"),times=c(4,6))
data<-data.frame(ID,dates,d1,d2,d3,d4)

标签: rdataframedatesubset

解决方案


您需要Date首先使用将日期从字符转换为对象as.Date。然后用于dput()以紧凑格式提供数据以进行发布:

data <- structure(list(dates = structure(c(13765, 13766, 13767, 13769, 
13795, 13796, 13797, 13800, 13801, 13806), class = "Date"),
d1 = structure(c(13766, 13766, 13766, 13766, 13794, 13794, 13794, 13794,
13794, 13794), class = "Date"), d2 = structure(c(13767, 13767, 13767, 13767, 
13796, 13796, 13796, 13796, 13796, 13796), class = "Date"), d3 = structure(c(NA, 
NA, NA, NA, 13799, 13799, 13799, 13799, 13799, 13799), class = "Date"), 
d4 = structure(c(NA, NA, NA, NA, 13802, 13802, 13802, 13802, 
13802, 13802), class = "Date")), class = "data.frame", row.names = c(NA, -10L))

现在设置您的选择标准并使用它们来创建data1data2

select1 <- with(data, dates >= d1 & dates <= d2)
select2 <- with(data, dates >= d3 & dates <= d4)
select2 <- ifelse(is.na(select2), TRUE, select2)
select <- select1 & select2
(data1 <- data[select, ])
#        dates         d1         d2   d3   d4
# 2 2007-09-10 2007-09-10 2007-09-11 <NA> <NA>
# 3 2007-09-11 2007-09-10 2007-09-11 <NA> <NA>
(data2 <- data[!select,])
#        dates         d1         d2         d3         d4
# 1  2007-09-09 2007-09-10 2007-09-11       <NA>       <NA>
# 4  2007-09-13 2007-09-10 2007-09-11       <NA>       <NA>
# 5  2007-10-09 2007-10-08 2007-10-10 2007-10-13 2007-10-16
# 6  2007-10-10 2007-10-08 2007-10-10 2007-10-13 2007-10-16
# 7  2007-10-11 2007-10-08 2007-10-10 2007-10-13 2007-10-16
# 8  2007-10-14 2007-10-08 2007-10-10 2007-10-13 2007-10-16
# 9  2007-10-15 2007-10-08 2007-10-10 2007-10-13 2007-10-16
# 10 2007-10-20 2007-10-08 2007-10-10 2007-10-13 2007-10-16

推荐阅读