首页 > 解决方案 > 基于多列值的数据帧的总和

问题描述

我正在处理大学橄榄球运动员的大型数据框及其按比赛进行的相关统计数据。它看起来像这样:

Name      School     Year     Receptions     Receiving_Yards
Player1   College1   2004       10                200 
Player2   College2   2002       15                150
Player3   College3   2007       11                110
Player1   College1   2004       17                150
Player2   College2   2002       13                130
Player1   College1   2005       14                170

我希望能够根据多个条件组合行:

  1. 我想创建一个数据框,将基于球员、学校和年份的所有内容结合起来,以获得他那个赛季的累积统计数据。像这样:

    Name      School     Year     Receptions     Receiving_Yards
    Player1   College1   2004       27                350 
    Player2   College2   2002       28                280
    Player3   College3   2007       11                110
    Player1   College1   2005       14                170
    
  2. 我想创建一个数据框,它结合了仅基于球员和学校的所有内容(即让我获得职业统计数据),但给了我年份跨度:

    Name      School     From    to      Receptions     Receiving_Yards
    Player1   College1   2004   2005        41                520 
    Player2   College2   2002   2002        28                280
    Player3   College3   2007   2007        11                110
    

我并没有完全同意获得 2 年的跨度,因为不太可能有太多同名球员为同一所学校效力。

我看过一些关于仅基于一个条件组合行的帖子,但是当我使用多个条件时,我该怎么做呢?

谢谢!

标签: r

解决方案


添加data.table替代方案:

library(data.table)
df1<-copy(df)
setDT(df1)
df1[,`:=`(From=first(Year),To=last(Year)),by=.(Name,School)
][,lapply(.SD,sum),by=.(Name,School,From,To),.SDcols=c("Receptions","Receiving_Yards")]

输出:

     Name   School  From   To     Receptions Receiving_Yards
1: Player2 College2 2002 2002         28             280
2: Player1 College1 2004 2005         41             520
3: Player3 College3 2007 2007         11             110

另一部分:

df1<-copy(df)
setDT(df1)
df1[,lapply(.SD,sum),by=.(Name,School,Year)]

或者,如果您不想重新制作 data.table,则从最后一部分删除列(导致第一个输出)

#df1<-copy(df) No need,see next
#setDT(df1) No need since you're using the same object as previously used
df1[,`:=`(From=NULL,To=NULL)]
df1[,lapply(.SD,sum),by=.(Name,School,Year)]
df1

输出:

      Name   School Year Receptions Receiving_Yards
1: Player1 College1 2004         27             350
2: Player2 College2 2002         28             280
3: Player3 College3 2007         11             110
4: Player1 College1 2005         14             170

推荐阅读