首页 > 解决方案 > 使用前面的数据重复

问题描述

嗨,我需要查找重复项,我附上了数据集的图像和重复项的示例。相同的 id 和与前面日期相同的结果。

任何帮助将非常感激。

数据集截屏

在此处输入图像描述

structure(list(id = c(1010001, 1010001, 1010001, 1010001, 1010001, 
1010001, 1010001, 1010001, 1010001, 1010001, 1010001, 1010001, 
1010001, 1010001, 1010001, 1010001, 1010001, 1010001, 1010001, 
1010001, 1010001, 1010001, 1010001, 1010001, 1010001, 1010001, 
1010001, 1010001, 1010001), DateCollected = structure(c(1145664000, 
1145750400, 1145836800, 1145923200, 1146009600, 1146096000, 1146096000, 
1146096000, 1146096000, 1146096000, 1146096000, 1146182400, 1146268800, 
1146355200, 1146441600, 1146528000, 1146614400, 1146700800, 1146787200, 
1146787200, 1146787200, 1146787200, 1146787200, 1146787200, 1146873600, 
1146960000, 1147046400, 1147132800, 1147219200), class = c("POSIXct", 
"POSIXt"), tzone = "UTC"), Test = c("Tacrolimus (FK506)", "Tacrolimus (FK506)", 
"Tacrolimus (FK506)", "Tacrolimus (FK506)", "Tacrolimus (FK506)", 
"Tacrolimus (FK506)", "Tacrolimus (FK506)", "Tacrolimus (FK506)", 
"Tacrolimus (FK506)", "Tacrolimus (FK506)", "Tacrolimus (FK506)", 
"Tacrolimus (FK506)", "Tacrolimus (FK506)", "Tacrolimus (FK506)", 
"Tacrolimus (FK506)", "Tacrolimus (FK506)", "Tacrolimus (FK506)", 
"Tacrolimus (FK506)", "Tacrolimus (FK506)", "Tacrolimus (FK506)", 
"Tacrolimus (FK506)", "Tacrolimus (FK506)", "Tacrolimus (FK506)", 
"Tacrolimus (FK506)", "Tacrolimus (FK506)", "Tacrolimus (FK506)", 
"Tacrolimus (FK506)", "Tacrolimus (FK506)", "Tacrolimus (FK506)"
), Result = c(3, 4.1, 5.9, 8.1, 4.6, 7, 7.8, 11.2, 18.1, 18.4, 
27, 4, 7.8, 8.4, 8.4, 6.1, 6.8, 5.4, 5.4, 6.5, 6.7, 8.1, 14.2, 
32.4, 7.2, 8.6, 8.9, 7.2, 9.6), Units = c("ug/L", "ug/L", "ug/L", 
"ug/L", "ug/L", "ug/L", "ug/L", "ug/L", "ug/L", "ug/L", "ug/L", 
"ug/L", "ug/L", "ug/L", "ug/L", "ug/L", "ug/L", "ug/L", "ug/L", 
"ug/L", "ug/L", "ug/L", "ug/L", "ug/L", "ug/L", "ug/L", "ug/L", 
"ug/L", "ug/L")), row.names = c(NA, -29L), class = c("tbl_df", 
"tbl", "data.frame"))

标签: rdataframedateduplicates

解决方案


Result我们可以编写一个函数来计算发现重复行索引 的值和返回行索引之间的差异。

find_duplicates <- function(x) {
  inds <- which(diff(x) == 0)
  sort(unique(c(inds, inds + 1)))
}

我们可以按组应用此功能。

要获得重复的行,我们可以这样做:

library(dplyr)
df %>% group_by(id) %>% slice(find_duplicates(Result))

#      id DateCollected       Test               Result Units
#    <dbl> <dttm>              <chr>               <dbl> <chr>
#1 1010001 2006-04-30 00:00:00 Tacrolimus (FK506)    8.4 ug/L 
#2 1010001 2006-05-01 00:00:00 Tacrolimus (FK506)    8.4 ug/L 
#3 1010001 2006-05-04 00:00:00 Tacrolimus (FK506)    5.4 ug/L 
#4 1010001 2006-05-05 00:00:00 Tacrolimus (FK506)    5.4 ug/L 

要获得额外的标志列,我们可以使用:

df %>% 
  group_by(id) %>% 
  mutate(is_duplicate = row_number() %in% find_duplicates(Result))

推荐阅读