首页 > 解决方案 > 匹配两个不同大小数据集中的列,并按条件删除行

问题描述

我有两个长度不同的数据集,如果它们具有相同的 id,我想比较一列中的值,并删除那些值较小的行。例如,我有如下的 dataset_1 和 dataset_2,我想按 case.id 比较“时间”列中的值,并从 dataset_2 中删除那些值小于 dataset_1 中的值的行。

dataset_1 <-    case.id time
             1    xxx1    1
             2    xxx2    2
             3    xxx3    3
dataset_2 <-    case.id distance time
             1    xxx1      100  0.8
             2    xxx1       50  1.2
             3    xxx1       40  2.0
             4    xxx2       50  3.0
             5    xxx2       40  4.0
             6    xxx3      100  2.5
             7    xxx3       50  3.0
             8    xxx3      100  3.5
             9    xxx3       50  5.0

我的预期结果应该是这样的,

new_dataset_2  <-   case.id distance time
                  1    xxx1       50  1.2
                  2    xxx1       40  2.0
                  3    xxx2       50  3.0
                  4    xxx2       40  4.0
                  5    xxx3       50  3.0
                  6    xxx3      100  3.5
                  7    xxx3       50  5.0

数据

dataset_1 <- structure(list(case.id = c("xxx1", "xxx2", "xxx3"), time = 1:3), .Names = c("case.id", 
"time"), class = "data.frame", row.names = c("1", "2", "3"))

dataset_2 <- structure(list(case.id = c("xxx1", "xxx1", "xxx1", "xxx2", "xxx2", 
"xxx3", "xxx3", "xxx3", "xxx3"), distance = c(100L, 50L, 40L, 
50L, 40L, 100L, 50L, 100L, 50L), time = c(0.8, 1.2, 2, 3, 4, 
2.5, 3, 3.5, 5)), .Names = c("case.id", "distance", "time"), class = "data.frame", row.names = c("1", 
"2", "3", "4", "5", "6", "7", "8", "9"))

标签: r

解决方案


您可以merge根据您的标准对这两个数据框进行子集处理。

df_out <- merge(dataset_2, dataset_1, by = "case.id")
idx <- with(df_out, time.x >= time.y) # creates a logical vector we use for subsetting

df_out <- df_out[idx, c('case.id', 'distance', 'time.x')] # subset and filter
df_out <- setNames(df_out, names(dataset_2)) # rename columns
df_out
#  case.id distance time
#2    xxx1       50  1.2
#3    xxx1       40  2.0
#4    xxx2       50  3.0
#5    xxx2       40  4.0
#7    xxx3       50  3.0
#8    xxx3      100  3.5
#9    xxx3       50  5.0

推荐阅读