首页 > 解决方案 > group_by,根据名称对不同的列使用不同的函数

问题描述

我有一个很大的df,有很多列(+100),其中一些的名称以“_e”结尾,而另一些则以“_se”结尾。我想使用列的标签来总结这些变量sign。对于那些进入“_e”的列,我想得到这些值的总和。对于以“_se”结尾的那些,要使用的公式是值的平方幂之和的平方根 - sqrt(sum((x)^2)

sqrt(sum((x)^2)根据列的命名方式,我是否可以使用这些不同的函数(sum 和 )来聚合我的数据?一个附加条件是,分组仅发生在名称以 _e 或 _se 结尾的那些列,避免所有未这样命名的列。

然后,所需的结果将是具有 2 行的 df - 一行用于“冷”,另一行用于“热”,以“_e”或“_se”结尾的列的新聚合值。请参阅下面的虚拟数据:

structure(list(POC_e = c(58521L, 16161L, 1194L, 2699L, 1271L, 
1967L, 28248L, 12160L, 14397L, 7079L, 13667L, 4951L, 6604L, 13696L, 
12551L, 2772L, 1848L, 34065L, 33594L, 67847L, 25307L, 70405L, 
93996L, 65963L, 39789L, 20335L, 15903L, 108016L, 31371L, 24278L, 
50822L, 20552L, 37172L, 41811L, 28593L, 16834L, 26969L), POC_se = c(1291.92647266486, 
939.72404052784, 190.346374094663, 388.212893466502, 208.822332096987, 
252.213709480391, 1160.08565499059, 838.438264515793, 878.262309647022, 
535.666492080041, 705.069168328457, 416.437334300563, 598.234978711058, 
937.860748119978, 626.051576985209, 329.912572655009, 297.104027430243, 
1103.07866902155, 1068.06832578293, 1581.78604703798, 1060.61990776234, 
1419.56938660059, 1844.54038903284, 1766.10256323752, 1983.39556445415, 
882.694867765916, 805.685353170852, 2313.56161072355, 920.266765311779, 
967.61140849172, 1634.66447221862, 504.867156218601, 824.885933362813, 
975.720818549529, 1096.29928336111, 777.860187976763, 952.665438551264
), BelPov_e = c(20565L, 3812L, 66L, 840L, 164L, 145L, 13560L, 
3952L, 5483L, 2591L, 3093L, 1928L, 1149L, 4197L, 4125L, 336L, 
190L, 8558L, 7332L, 14872L, 5141L, 14086L, 28506L, 10753L, 4411L, 
5961L, 5799L, 22387L, 5189L, 3335L, 7407L, 867L, 3077L, 3329L, 
2817L, 1597L, 2406L), BelPov_se = c(1068.87660902013, 411.317970001003, 
23.6457754878744, 172.906389645903, 54.6876004289108, 55.5557916537185, 
892.989254922357, 401.344733821059, 604.616650266614, 336.438879762679, 
404.301023537906, 282.386361349728, 147.670323118653, 496.534865097384, 
483.679751234656, 103.488187597609, 80.8533491461191, 648.06537390533, 
661.960303317718, 953.091653167516, 484.435550214379, 1032.29133967345, 
1525.69708374773, 869.176346634638, 637.864956679027, 655.848453420799, 
525.336305385565, 1478.17209676848, 420.427043758192, 416.538929000497, 
658.333683680529, 121.413258770268, 278.074805599612, 361.857390496834, 
298.858858908146, 234.619743804892, 359.830153667996), sign = c("cold", 
"cold", "cold", "cold", "cold", "cold", "cold", "cold", "cold", 
"cold", "cold", "cold", "cold", "cold", "cold", "cold", "cold", 
"hot", "hot", "hot", "hot", "hot", "hot", "hot", "hot", "cold", 
"cold", "hot", "hot", "hot", "hot", "hot", "hot", "hot", "hot", 
"hot", "hot")), row.names = c(NA, 37L), class = "data.frame")

标签: rdataframegroup-bydplyrsummarize

解决方案


在 Base-R 中

e <- df1[,grep("_e", colnames(df1))]
se <- df1[,grep("_se", colnames(df1))]

e <- lapply(split(e, df1$sign), colSums)
se <-  lapply(split(se, df1$sign), function(x) apply(x,2,function(y) sqrt(sum(y^2))))

cbind(do.call(rbind,e),do.call(rbind,se))

输出:

      POC_e BelPov_e   POC_se BelPov_se
cold 236024    77956 3143.043  2062.604
hot  817384   146070 5732.776  3165.828

推荐阅读