r - 基于多列值的数据帧的总和
问题描述
我正在处理大学橄榄球运动员的大型数据框及其按比赛进行的相关统计数据。它看起来像这样:
Name School Year Receptions Receiving_Yards
Player1 College1 2004 10 200
Player2 College2 2002 15 150
Player3 College3 2007 11 110
Player1 College1 2004 17 150
Player2 College2 2002 13 130
Player1 College1 2005 14 170
我希望能够根据多个条件组合行:
我想创建一个数据框,将基于球员、学校和年份的所有内容结合起来,以获得他那个赛季的累积统计数据。像这样:
Name School Year Receptions Receiving_Yards Player1 College1 2004 27 350 Player2 College2 2002 28 280 Player3 College3 2007 11 110 Player1 College1 2005 14 170
我想创建一个数据框,它结合了仅基于球员和学校的所有内容(即让我获得职业统计数据),但给了我年份跨度:
Name School From to Receptions Receiving_Yards Player1 College1 2004 2005 41 520 Player2 College2 2002 2002 28 280 Player3 College3 2007 2007 11 110
我并没有完全同意获得 2 年的跨度,因为不太可能有太多同名球员为同一所学校效力。
我看过一些关于仅基于一个条件组合行的帖子,但是当我使用多个条件时,我该怎么做呢?
谢谢!
解决方案
添加data.table
替代方案:
library(data.table)
df1<-copy(df)
setDT(df1)
df1[,`:=`(From=first(Year),To=last(Year)),by=.(Name,School)
][,lapply(.SD,sum),by=.(Name,School,From,To),.SDcols=c("Receptions","Receiving_Yards")]
输出:
Name School From To Receptions Receiving_Yards
1: Player2 College2 2002 2002 28 280
2: Player1 College1 2004 2005 41 520
3: Player3 College3 2007 2007 11 110
另一部分:
df1<-copy(df)
setDT(df1)
df1[,lapply(.SD,sum),by=.(Name,School,Year)]
或者,如果您不想重新制作 data.table,则从最后一部分删除列(导致第一个输出)
#df1<-copy(df) No need,see next
#setDT(df1) No need since you're using the same object as previously used
df1[,`:=`(From=NULL,To=NULL)]
df1[,lapply(.SD,sum),by=.(Name,School,Year)]
df1
输出:
Name School Year Receptions Receiving_Yards
1: Player1 College1 2004 27 350
2: Player2 College2 2002 28 280
3: Player3 College3 2007 11 110
4: Player1 College1 2005 14 170