首页 > 解决方案 > 如何在 R 中计算此数据框的差异

问题描述

我有多个列,但这是我的一小部分数据:

df<- read.table(text = "Var1    M1  N1  Var2    M2  N2  Var3    M3  N3  Var4    M4  N4  Var5    M5  N5  Var6    M6  N6
 A11    1   0   A12 0.3 0.5 A21 1   0.3 A22 0.6 0   A31 1   0.2 A32 0   1
             
", header = TRUE)

我想计算 M 和 N 的差异。所以,Ma-Mb 和 Na-Nb 得到下表

Var diffM   diffN
A1  0.7 -0.5
A2  0.4  0.3
A3  1   -0.8

标签: r

解决方案


我们可以先重塑为“长”格式,然后按diff across “M”、“N”列进行分组

library(dplyr)
library(tidyr)
library(stringr)
df %>% 
   pivot_longer(cols = everything(), names_to = c(".value", 'grp'), 
       names_sep = "(?<=[^0-9])(?=[0-9])") %>% 
   group_by(Var = str_sub(Var, 1, 2)) %>% 
   summarise(across(c(M, N), ~ -diff(.), .names = "diff{.col}"), .groups = 'drop')
# A tibble: 3 x 3
#  Var   diffM diffN
#  <chr> <dbl> <dbl>
#1 A1     0.7   -0.5
#2 A2     0.4    0.3
#3 A3     1     -0.8

如果我们想要sumfor 'N',在summarise

df %>% 
   pivot_longer(cols = everything(), names_to = c(".value", 'grp'), 
       names_sep = "(?<=[^0-9])(?=[0-9])") %>% 
   group_by(Var = str_sub(Var, 1, 2)) %>% 
   summarise(across(c(M, N), ~ -diff(.), .names = "diff{.col}"), 
               sumN = sum(N), .groups = 'drop')
# A tibble: 3 x 4
#  Var   diffM diffN  sumN
#  <chr> <dbl> <dbl> <dbl>
#1 A1      0.7  -0.5   0.5
#2 A2      0.4   0.3   0.3
#3 A3      1    -0.8   1.2

推荐阅读