首页 > 解决方案 > 将所有行与 R 分组数据框中的当前行进行比较

问题描述

嗨,我正在尝试确定组中是否有任何行的版本比该组的任何其他行正好小 1,并将其标记在另一列中。我已经查看了滞后和领先,但问题是值不同的 1 行可能彼此相邻,也可能不相邻。

这是一个可重现的示例

数据:

library(dplyr)

df <- tibble('Plate' = c("A1","A1","A1","A1","A1","A2","A2","A2","A2","A2","A3","A3","A3","A3","A3","A3"),
             'Sample' = c("a", "a","a","b","b","a","a","b","b","b","a","b","b","c","c","c"),
             'Location' = c("x","x","x","y","y","y","y","x","x","x","x","y","y","x","x","x"),
             'Version' = c(1,1.2,2,22,26,9,9.3,11,11.3,12,19,32.2,33.2,14,15,15))

我尝试过的最后一次迭代 改编自如何将当前行与 r (和其他)中的所有先前行进行比较

df_test <- df  %>%
  group_by(Plate,Sample,Location) %>% 
  arrange(desc(Version)) %>% 
  mutate(diff = sapply(seq_along(Version), function(i){
    if_else(any(.[1:(i-1),'Version'] - .[[i,'Version']] == 1.0), -1.0, 0)})
    )

预期输出:

   Plate Sample Location Version  diff
   <chr> <chr>  <chr>      <dbl> <dbl>
 1 A3    b      y           33.2     0
 2 A3    b      y           32.2    -1
 3 A1    b      y           26       0
 4 A1    b      y           22       0
 5 A3    a      x           19       0
 6 A3    c      x           15       0
 7 A3    c      x           15       0
 8 A3    c      x           14      -1
 9 A2    b      x           12       0
10 A2    b      x           11.3     0
11 A2    b      x           11      -1
12 A2    a      y            9.3     0
13 A2    a      y            9       0
14 A1    a      x            2       0
15 A1    a      x            1.2     0
16 A1    a      x            1      -1

实际输出:

   Plate Sample Location Version  diff
   <chr> <chr>  <chr>      <dbl> <dbl>
 1 A3    b      y           33.2     0
 2 A3    b      y           32.2    -1
 3 A1    b      y           26       0
 4 A1    b      y           22      -1
 5 A3    a      x           19       0
 6 A3    c      x           15       0
 7 A3    c      x           15      -1
 8 A3    c      x           14       0
 9 A2    b      x           12       0
10 A2    b      x           11.3    -1
11 A2    b      x           11       0
12 A2    a      y            9.3     0
13 A2    a      y            9      -1
14 A1    a      x            2       0
15 A1    a      x            1.2    -1
16 A1    a      x            1       0

似乎正在查看行索引以进行比较(或忽略组?),我如何让它查看价值?感觉就像我很接近。如果需要,我更喜欢 dplyr 答案,但 data.table 可以接受。抱歉,如果我错过了已回答的相关帖子

标签: rdplyr

解决方案


我会试试这个:

df  %>%
  group_by(Plate,Sample,Location) %>%
  mutate(diff = if_else((Version + 1) %in% Version, -1, 0))
# # A tibble: 16 x 5
# # Groups:   Plate, Sample, Location [7]
#    Plate Sample Location Version  diff
#    <chr> <chr>  <chr>      <dbl> <dbl>
#  1 A1    a      x            1      -1
#  2 A1    a      x            1.2     0
#  3 A1    a      x            2       0
#  4 A1    b      y           22       0
#  5 A1    b      y           26       0
#  6 A2    a      y            9       0
#  7 A2    a      y            9.3     0
#  8 A2    b      x           11      -1
#  9 A2    b      x           11.3     0
# 10 A2    b      x           12       0
# 11 A3    a      x           19       0
# 12 A3    b      y           32.2    -1
# 13 A3    b      y           33.2     0
# 14 A3    c      x           14      -1
# 15 A3    c      x           15       0
# 16 A3    c      x           15       0

由于您的版本并非都是整数,因此存在一些数值精度问题的风险,但它似乎适用于您的数字相对较低的示例。

数值稳定的版本可能如下所示:

df  %>%
  group_by(Plate,Sample,Location) %>%
  mutate(diff = if_else(apply(abs(outer(Version, Version, "-") + 1) < 1e-10, 1, any), -1, 0))

(与上述结果相同)

要了解它的工作方式/原因,请从文本向量开始,然后在x = c(1, 1.2, 2)其上运行代码片段 - outer(x, x, "-"),然后添加+ 1等。


推荐阅读