r - 将所有行与 R 分组数据框中的当前行进行比较
问题描述
嗨,我正在尝试确定组中是否有任何行的版本比该组的任何其他行正好小 1,并将其标记在另一列中。我已经查看了滞后和领先,但问题是值不同的 1 行可能彼此相邻,也可能不相邻。
这是一个可重现的示例
数据:
library(dplyr)
df <- tibble('Plate' = c("A1","A1","A1","A1","A1","A2","A2","A2","A2","A2","A3","A3","A3","A3","A3","A3"),
'Sample' = c("a", "a","a","b","b","a","a","b","b","b","a","b","b","c","c","c"),
'Location' = c("x","x","x","y","y","y","y","x","x","x","x","y","y","x","x","x"),
'Version' = c(1,1.2,2,22,26,9,9.3,11,11.3,12,19,32.2,33.2,14,15,15))
我尝试过的最后一次迭代 改编自如何将当前行与 r (和其他)中的所有先前行进行比较
df_test <- df %>%
group_by(Plate,Sample,Location) %>%
arrange(desc(Version)) %>%
mutate(diff = sapply(seq_along(Version), function(i){
if_else(any(.[1:(i-1),'Version'] - .[[i,'Version']] == 1.0), -1.0, 0)})
)
预期输出:
Plate Sample Location Version diff
<chr> <chr> <chr> <dbl> <dbl>
1 A3 b y 33.2 0
2 A3 b y 32.2 -1
3 A1 b y 26 0
4 A1 b y 22 0
5 A3 a x 19 0
6 A3 c x 15 0
7 A3 c x 15 0
8 A3 c x 14 -1
9 A2 b x 12 0
10 A2 b x 11.3 0
11 A2 b x 11 -1
12 A2 a y 9.3 0
13 A2 a y 9 0
14 A1 a x 2 0
15 A1 a x 1.2 0
16 A1 a x 1 -1
实际输出:
Plate Sample Location Version diff
<chr> <chr> <chr> <dbl> <dbl>
1 A3 b y 33.2 0
2 A3 b y 32.2 -1
3 A1 b y 26 0
4 A1 b y 22 -1
5 A3 a x 19 0
6 A3 c x 15 0
7 A3 c x 15 -1
8 A3 c x 14 0
9 A2 b x 12 0
10 A2 b x 11.3 -1
11 A2 b x 11 0
12 A2 a y 9.3 0
13 A2 a y 9 -1
14 A1 a x 2 0
15 A1 a x 1.2 -1
16 A1 a x 1 0
似乎正在查看行索引以进行比较(或忽略组?),我如何让它查看价值?感觉就像我很接近。如果需要,我更喜欢 dplyr 答案,但 data.table 可以接受。抱歉,如果我错过了已回答的相关帖子
解决方案
我会试试这个:
df %>%
group_by(Plate,Sample,Location) %>%
mutate(diff = if_else((Version + 1) %in% Version, -1, 0))
# # A tibble: 16 x 5
# # Groups: Plate, Sample, Location [7]
# Plate Sample Location Version diff
# <chr> <chr> <chr> <dbl> <dbl>
# 1 A1 a x 1 -1
# 2 A1 a x 1.2 0
# 3 A1 a x 2 0
# 4 A1 b y 22 0
# 5 A1 b y 26 0
# 6 A2 a y 9 0
# 7 A2 a y 9.3 0
# 8 A2 b x 11 -1
# 9 A2 b x 11.3 0
# 10 A2 b x 12 0
# 11 A3 a x 19 0
# 12 A3 b y 32.2 -1
# 13 A3 b y 33.2 0
# 14 A3 c x 14 -1
# 15 A3 c x 15 0
# 16 A3 c x 15 0
由于您的版本并非都是整数,因此存在一些数值精度问题的风险,但它似乎适用于您的数字相对较低的示例。
数值稳定的版本可能如下所示:
df %>%
group_by(Plate,Sample,Location) %>%
mutate(diff = if_else(apply(abs(outer(Version, Version, "-") + 1) < 1e-10, 1, any), -1, 0))
(与上述结果相同)
要了解它的工作方式/原因,请从文本向量开始,然后在x = c(1, 1.2, 2)
其上运行代码片段 - outer(x, x, "-")
,然后添加+ 1
等。
推荐阅读
- c++ - 当我在网格/面中实现索引时,为什么它会返回 OpenGL 错误?
- python - 如何在现有 Pycharm 之后保存 Sqlite (python) 数据?
- python-3.x - 底图不会导入,因为找不到“epsg”文件或目录(MacOS、Anaconda、Jupyter Notebook)
- apache - 如何使用 Apache 为 Wildly 项目实现代理?
- php - 可能已过期 JWT php 的 Base 64 解码令牌
- javascript - nodeJS 应用程序作为多个 Web 应用程序的微服务
- javascript - 将长 JSON 文件转换为单个对象
- windows - Powershell:将“域管理员”组添加到服务器上的所有文件共享
- linear-programming - LP / MILP (CPLEX) 困难
- bluetooth - iOS13 - 检测是否启用蓝牙而不提示蓝牙使用请求