r - 在列 R 中的组之间应用差异和均值
问题描述
我有一个像这样的数据框:
df = data.frame("subjectID" = c("S1","S2","S2","S1","S1","S2","S2","S1","S1","S2","S1","S2"), "treatment" = c("none","none","none","none","drug1","drug1","drug1","drug1","drug2","drug2","drug2","drug2"), "protein" = c("proteinA","proteinA","proteinB","proteinB","proteinA","proteinA","proteinB","proteinB","proteinA","proteinA","proteinB","proteinB"), "value"= c(5.3,4.3,4.5,2.3,6.5,5.4,1.2,3.2,2.3,4.5,6.5,3.4))
subjectID treatment protein value
1 S1 none proteinA 5.3
2 S2 none proteinA 4.3
3 S2 none proteinB 4.5
4 S1 none proteinB 2.3
5 S1 drug1 proteinA 6.5
6 S2 drug1 proteinA 5.4
7 S2 drug1 proteinB 1.2
8 S1 drug1 proteinB 3.2
9 S1 drug2 proteinA 2.3
10 S2 drug2 proteinA 4.5
11 S1 drug2 proteinB 6.5
12 S2 drug2 proteinB 3.4
我必须对此数据框进行以下计算:
- 对于每个受试者的每种蛋白质,找到治疗 =“药物 1”和治疗 =“无”之间的值差异。
所以基本上对于一个单一的计算它会是:
diff = df$value[df$subjectID == "S1" & df$treatment == "drug1" & df$protein == "proteinA"] - df$value[df$subjectID == "S1" & df$treatment == "none" & df$protein == "proteinA"]
diff
> 1.2
在上面的例子中,值 6.5 - 5.3 给出了处理过的和未处理过的蛋白质 A 样本之间的差异。我同样对 S2 和 proteinA、S1/proteinB 和 S2/proteinB 重复此操作。
- 找出受试者之间的平均差。
我的原始数据有 5 个不同的受试者、10 种不同的治疗方法(包括治疗 ==“无”)和 100 种蛋白质,我不可能手动为每个分组执行此操作。我将不得不计算每种药物治疗和未治疗之间的平均差异(9 种不同的药物治疗与未治疗)。
所需的输出可能是这样的:
resdf
protein drug1_mean_diff drug2_mean_diff
1 proteinA 1.15 -1.4
2 proteinB -1.2 1.55
我最终应该有 100 个蛋白质(行)和 9 个平均差异(列)
希望这很清楚。
谢谢 !
解决方案
不知何故,我无法重现问题中显示的预期输出。但是,我认为这段代码应该给出想要的答案。但我可能弄错了或误解了什么。所以请在使用代码之前检查:
df = data.frame(subjectID = c("S1","S2","S2","S1","S1","S2","S2","S1","S1","S2","S1","S2"),
treatment = c("none","none","none","none","drug1","drug1","drug1","drug1","drug2","drug2","drug2","drug2"),
protein = c("proteinA","proteinA","proteinB","proteinB","proteinA","proteinA","proteinB","proteinB","proteinA","proteinA","proteinB","proteinB"),
value = c(5.3,4.3,4.5,2.3,6.5,5.4,1.2,3.2,2.3,4.5,6.5,3.4))
df %>%
filter(treatment != "none") %>%
left_join(df %>% filter(treatment == "none") %>% rename(control = value) %>% select(subjectID, protein, control)) %>%
mutate(diff = value - control) %>%
select(subjectID, protein, treatment, diff) %>%
pivot_wider(names_from = treatment, values_from = diff, names_prefix = "diff_") %>%
group_by(protein) %>%
summarise(across(starts_with("diff"), mean, rm.na=TRUE))
回报:
protein diff_drug1 diff_drug2
<chr> <dbl> <dbl>
1 proteinA 1.15 -1.4
2 proteinB -1.20 1.55
推荐阅读
- forms - 使用 Google Apps 脚本从传入的 webhook 解析“表单数据”
- javascript - Jquery将数组数组合并到另一个数组数组中
- python - 如何在 tensorflow1.x 中删除 2d 张量的下三角形(包括 diag)?
- c - 如何从txt文件的末尾读取一个整数?
- python - RNN、Keras、Python:Min Max Scaler Data normalization ValueError:找到暗淡为 3 的数组。预计估计器 <= 2
- html - 固定位置使我的图像溢出,即使 > 溢出:隐藏;
- arm64 - Aarch64 程序集上的 32 位 ADD
- javascript - 如何设置formArray的初始值?
- css - Vue 样式共享组件
- python - 如何计算 Python 数据框中字符串的出现次数?