r - 使用前面的数据重复
问题描述
嗨,我需要查找重复项,我附上了数据集的图像和重复项的示例。相同的 id 和与前面日期相同的结果。
任何帮助将非常感激。
数据集截屏
structure(list(id = c(1010001, 1010001, 1010001, 1010001, 1010001,
1010001, 1010001, 1010001, 1010001, 1010001, 1010001, 1010001,
1010001, 1010001, 1010001, 1010001, 1010001, 1010001, 1010001,
1010001, 1010001, 1010001, 1010001, 1010001, 1010001, 1010001,
1010001, 1010001, 1010001), DateCollected = structure(c(1145664000,
1145750400, 1145836800, 1145923200, 1146009600, 1146096000, 1146096000,
1146096000, 1146096000, 1146096000, 1146096000, 1146182400, 1146268800,
1146355200, 1146441600, 1146528000, 1146614400, 1146700800, 1146787200,
1146787200, 1146787200, 1146787200, 1146787200, 1146787200, 1146873600,
1146960000, 1147046400, 1147132800, 1147219200), class = c("POSIXct",
"POSIXt"), tzone = "UTC"), Test = c("Tacrolimus (FK506)", "Tacrolimus (FK506)",
"Tacrolimus (FK506)", "Tacrolimus (FK506)", "Tacrolimus (FK506)",
"Tacrolimus (FK506)", "Tacrolimus (FK506)", "Tacrolimus (FK506)",
"Tacrolimus (FK506)", "Tacrolimus (FK506)", "Tacrolimus (FK506)",
"Tacrolimus (FK506)", "Tacrolimus (FK506)", "Tacrolimus (FK506)",
"Tacrolimus (FK506)", "Tacrolimus (FK506)", "Tacrolimus (FK506)",
"Tacrolimus (FK506)", "Tacrolimus (FK506)", "Tacrolimus (FK506)",
"Tacrolimus (FK506)", "Tacrolimus (FK506)", "Tacrolimus (FK506)",
"Tacrolimus (FK506)", "Tacrolimus (FK506)", "Tacrolimus (FK506)",
"Tacrolimus (FK506)", "Tacrolimus (FK506)", "Tacrolimus (FK506)"
), Result = c(3, 4.1, 5.9, 8.1, 4.6, 7, 7.8, 11.2, 18.1, 18.4,
27, 4, 7.8, 8.4, 8.4, 6.1, 6.8, 5.4, 5.4, 6.5, 6.7, 8.1, 14.2,
32.4, 7.2, 8.6, 8.9, 7.2, 9.6), Units = c("ug/L", "ug/L", "ug/L",
"ug/L", "ug/L", "ug/L", "ug/L", "ug/L", "ug/L", "ug/L", "ug/L",
"ug/L", "ug/L", "ug/L", "ug/L", "ug/L", "ug/L", "ug/L", "ug/L",
"ug/L", "ug/L", "ug/L", "ug/L", "ug/L", "ug/L", "ug/L", "ug/L",
"ug/L", "ug/L")), row.names = c(NA, -29L), class = c("tbl_df",
"tbl", "data.frame"))
解决方案
Result
我们可以编写一个函数来计算发现重复行索引 的值和返回行索引之间的差异。
find_duplicates <- function(x) {
inds <- which(diff(x) == 0)
sort(unique(c(inds, inds + 1)))
}
我们可以按组应用此功能。
要获得重复的行,我们可以这样做:
library(dplyr)
df %>% group_by(id) %>% slice(find_duplicates(Result))
# id DateCollected Test Result Units
# <dbl> <dttm> <chr> <dbl> <chr>
#1 1010001 2006-04-30 00:00:00 Tacrolimus (FK506) 8.4 ug/L
#2 1010001 2006-05-01 00:00:00 Tacrolimus (FK506) 8.4 ug/L
#3 1010001 2006-05-04 00:00:00 Tacrolimus (FK506) 5.4 ug/L
#4 1010001 2006-05-05 00:00:00 Tacrolimus (FK506) 5.4 ug/L
要获得额外的标志列,我们可以使用:
df %>%
group_by(id) %>%
mutate(is_duplicate = row_number() %in% find_duplicates(Result))
推荐阅读
- flutter - 如何缓存手机联系人?
- c# - 从服务器接收结果时发生传输级错误错误 C#
- sql - 为什么我在运行此代码时收到“无效标识符”oracle 错误?
- python - Python 词云 - TypeError:预期的字符串
- python - Pycharm 内联文档不适用于 Macbook 中的 python 3.7
- flutter - Flutter - 使用 pushName 获取数据
- php - htaccess 重定向 301 相对路径,在需要 https 时将 url 从 https 重定向到 http
- python - 使用pairplot和相关方法观察数据框不同变量之间的关系
- node.js - 我们如何在提示功能中设置“语言环境”
- javascript - 将 CSV 文件数据加载到 HTML 表中