首页 > 解决方案 > R中的条件聚合

问题描述

考虑以下矩阵:

d <- data.frame(c("a","a","a","a","b","b","b","b"),c("a1","a1","a2","a2","a1","a1","a2","a2"),"c","d",c(1:8))

我想聚合第 5 列中的值,所以我得到以下 data.frame:

d1 <- data.frame(c("a","a","b","b"),c("a1","a2","a1","a2"),"c","d",c(3,7,11,15))

也就是说,我想根据第 2 列中的名称聚合第 5 列中的值。所以,我想保留第 1、3 和 4 列中的名称(在这种情况下,第 3 列和第 4 列中的名称是相同的,但在我的情况下有所不同)。

我如何在 R 中做到这一点?

标签: rconditional-statementsaggregation

解决方案


使用data.table

代码

require(data.table)
d[, .(unique(V3), unique(V4), sum(V5)), .(V1, V2)]

具体来说,语法如下dt[i, j, by]。声明对象i的行子集,声明要在该子集上执行的操作(简写),并分配变量分组。在你的情况下,你想跨越-对。此外,我们应用到并防止重复行。data.tablejlist.bysum V3V1V2unique()V4V5

结果

   V1 V2 V1 V2 V3
1:  a a1  c  d  3
2:  a a2  c  d  7
3:  b a1  c  d 11
4:  b a2  c  d 15

数据

d = data.table(V1 = c("a","a","a","a","b","b","b","b"), 
                V2 = c("a1","a1","a2","a2","a1","a1","a2","a2"), 
                V3 = "c", 
                V4 = "d", 
                V5 = c(1:8))

推荐阅读