首页 > 解决方案 > 在列 R 中的组之间应用差异和均值

问题描述

我有一个像这样的数据框:

df = data.frame("subjectID" = c("S1","S2","S2","S1","S1","S2","S2","S1","S1","S2","S1","S2"), "treatment" = c("none","none","none","none","drug1","drug1","drug1","drug1","drug2","drug2","drug2","drug2"), "protein" = c("proteinA","proteinA","proteinB","proteinB","proteinA","proteinA","proteinB","proteinB","proteinA","proteinA","proteinB","proteinB"), "value"= c(5.3,4.3,4.5,2.3,6.5,5.4,1.2,3.2,2.3,4.5,6.5,3.4))

   subjectID treatment  protein value
1         S1      none proteinA   5.3
2         S2      none proteinA   4.3
3         S2      none proteinB   4.5
4         S1      none proteinB   2.3
5         S1     drug1 proteinA   6.5
6         S2     drug1 proteinA   5.4
7         S2     drug1 proteinB   1.2
8         S1     drug1 proteinB   3.2
9         S1     drug2 proteinA   2.3
10        S2     drug2 proteinA   4.5
11        S1     drug2 proteinB   6.5
12        S2     drug2 proteinB   3.4

我必须对此数据框进行以下计算:

  1. 对于每个受试者的每种蛋白质,找到治疗 =“药物 1”和治疗 =“无”之间的值差异。

所以基本上对于一个单一的计算它会是:

diff = df$value[df$subjectID == "S1" & df$treatment == "drug1" & df$protein == "proteinA"] - df$value[df$subjectID == "S1" & df$treatment == "none" & df$protein == "proteinA"] 
diff 
> 1.2 

在上面的例子中,值 6.5 - 5.3 给出了处理过的和未处理过的蛋白质 A 样本之间的差异。我同样对 S2 和 proteinA、S1/proteinB 和 S2/proteinB 重复此操作。

  1. 找出受试者之间的平均差。

我的原始数据有 5 个不同的受试者、10 种不同的治疗方法(包括治疗 ==“无”)和 100 种蛋白质,我不可能手动为每个分组执行此操作。我将不得不计算每种药物治疗和未治疗之间的平均差异(9 种不同的药物治疗与未治疗)。

所需的输出可能是这样的:

 resdf
   protein drug1_mean_diff drug2_mean_diff
1 proteinA             1.15            -1.4
2 proteinB             -1.2            1.55

我最终应该有 100 个蛋白质(行)和 9 个平均差异(列)

希望这很清楚。

谢谢 !

标签: rdplyrtidyr

解决方案


不知何故,我无法重现问题中显示的预期输出。但是,我认为这段代码应该给出想要的答案。但我可能弄错了或误解了什么。所以请在使用代码之前检查:

  df = data.frame(subjectID = c("S1","S2","S2","S1","S1","S2","S2","S1","S1","S2","S1","S2"), 
                  treatment = c("none","none","none","none","drug1","drug1","drug1","drug1","drug2","drug2","drug2","drug2"),
                  protein = c("proteinA","proteinA","proteinB","proteinB","proteinA","proteinA","proteinB","proteinB","proteinA","proteinA","proteinB","proteinB"),
                  value = c(5.3,4.3,4.5,2.3,6.5,5.4,1.2,3.2,2.3,4.5,6.5,3.4))
  
  
  df %>% 
     filter(treatment != "none") %>% 
     left_join(df %>% filter(treatment == "none") %>% rename(control = value) %>% select(subjectID, protein, control)) %>% 
     mutate(diff = value - control) %>% 
     select(subjectID, protein, treatment, diff) %>% 
     pivot_wider(names_from = treatment, values_from = diff, names_prefix = "diff_") %>% 
     group_by(protein) %>% 
       summarise(across(starts_with("diff"), mean, rm.na=TRUE))
  

回报:

    protein  diff_drug1 diff_drug2
    <chr>         <dbl>      <dbl>
  1 proteinA       1.15      -1.4 
  2 proteinB      -1.20       1.55

推荐阅读