首页 > 解决方案 > 通过R中的组计算多行之间的差异

问题描述

我有一个这样的数据框(比这个例子有更多的观察和代码变量):

  code  tmp     wek   sbd
   <chr> <chr> <dbl> <dbl>
 1 abc01 T1        1  7.83
 2 abc01 T1        1  7.83
 3 abc01 T1        2  8.5 
 4 abc01 T1        2  8.5 
 5 abc01 T1        1  7.83
 6 abc01 T1        1  7.83
 7 abc01 T1        1  7.83
 8 abc01 T1        1  7.83
 9 abc01 T1        1  7.83
10 abc01 T2        1  7.56
11 abc01 T2        1  7.56
12 abc01 T2        2  7.22
13 abc01 T2        2  7.22
14 abc01 T2        1  7.56
15 abc01 T2        1  7.56
16 abc01 T2        1  7.56
17 abc01 T2        1  7.56
18 abc01 T2        1  7.56

现在我想计算一个新变量,通过代码和 tmp 变量给出变量 sbd 在 wek = 1 和 wek = 2 之间的差异。

到目前为止,我刚刚找到了可以区分连续行的函数,但这不适合我的情况。

标签: rgroup-bydiffrows

解决方案


您可以使用1 和 2match获取对应的sbd值。wk

library(dplyr)

df %>%
  group_by(code, tmp) %>%
  summarise(diff = sbd[match(1, wek)] - sbd[match(2, wek)])

#  code  tmp    diff
#  <chr> <chr> <dbl>
#1 abc01 T1    -0.67
#2 abc01 T2     0.34

如果要在数据框中添加新列以保持行相同,请使用mutate而不是summarise.

数据

如果您以可重现的格式提供数据,则更容易提供帮助

df <- structure(list(code = c("abc01", "abc01", "abc01", "abc01", "abc01", 
"abc01", "abc01", "abc01", "abc01", "abc01", "abc01", "abc01", 
"abc01", "abc01", "abc01", "abc01", "abc01", "abc01"), tmp = c("T1", 
"T1", "T1", "T1", "T1", "T1", "T1", "T1", "T1", "T2", "T2", "T2", 
"T2", "T2", "T2", "T2", "T2", "T2"), wek = c(1L, 1L, 2L, 2L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 1L, 1L), sbd = c(7.83, 
7.83, 8.5, 8.5, 7.83, 7.83, 7.83, 7.83, 7.83, 7.56, 7.56, 7.22, 
7.22, 7.56, 7.56, 7.56, 7.56, 7.56)), 
class = "data.frame", row.names = c(NA, -18L))

推荐阅读