首页 > 解决方案 > 聚合表时出现因子错误但没有列是因子

问题描述

所以我从 csv 文件上传了我的数据。我尝试上传它,stringsAsFactors = FALSE但我仍然收到错误。前 13 列是字符串,其余列(从 14 列开始)都是数字。下面是核心代码:

library("readxl")

# Read data with facotr is False 
data <- read.csv("PFR csvData.csv",stringsAsFactors = FALSE)

# Convert all numeric rows to numeric
data[,14:length(colnames(data))]<- as.numeric(as.character(unlist(data[,14:length(colnames(data))])))

# Convert all string rows to characters
data[,1:13]<- as.character(unlist(data[,1:13]))

当我检查每一列的类时,sapply(data, class)我得到:

           Rk           Player              Pos              Age             Date               Lg               Tm 
     "character"      "character"      "character"      "character"      "character"      "character"      "character" 
             H.A              Opp           Result               G.             Week              Day    Receiving_Tgt 
     "character"      "character"      "character"      "character"      "character"      "character"        "numeric" 
   Receiving_Rec    Receiving_Yds    Receiving_Y.R     Receiving_TD  Receiving_Ctch.  Receiving_Y.Tgt    Receiving_PPR 
       "numeric"        "numeric"        "numeric"        "numeric"        "numeric"        "numeric"        "numeric" 
     Passing_Cmp      Passing_Att     Passing_Cmp.      Passing_Yds       Passing_TD      Passing_Int     Passing_Rate 
       "numeric"        "numeric"        "numeric"        "numeric"        "numeric"        "numeric"        "numeric" 
      Passing_Sk   Passing_Sk_Yds      Passing_Y.A     Passing_AY.A      Passing_PPR      Rushing_Att      Rushing_Yds 
       "numeric"        "numeric"        "numeric"        "numeric"        "numeric"        "numeric"        "numeric" 
     Rushing_Y.A       Rushing_TD Rushing_Half_PPR   Total_Half_PPR 
       "numeric"        "numeric"        "numeric"        "numeric" 

我还通过apply(data, 2, function(x) any(is.na(x)))并获得了 NAs:

              Rk           Player              Pos              Age             Date               Lg               Tm 
           FALSE            FALSE            FALSE            FALSE            FALSE            FALSE            FALSE 
             H.A              Opp           Result               G.             Week              Day    Receiving_Tgt 
           FALSE            FALSE            FALSE            FALSE            FALSE            FALSE            FALSE 
   Receiving_Rec    Receiving_Yds    Receiving_Y.R     Receiving_TD  Receiving_Ctch.  Receiving_Y.Tgt    Receiving_PPR 
           FALSE            FALSE            FALSE            FALSE            FALSE            FALSE            FALSE 
     Passing_Cmp      Passing_Att     Passing_Cmp.      Passing_Yds       Passing_TD      Passing_Int     Passing_Rate 
           FALSE            FALSE            FALSE            FALSE            FALSE            FALSE            FALSE 
      Passing_Sk   Passing_Sk_Yds      Passing_Y.A     Passing_AY.A      Passing_PPR      Rushing_Att      Rushing_Yds 
           FALSE            FALSE            FALSE            FALSE            FALSE            FALSE            FALSE 
     Rushing_Y.A       Rushing_TD Rushing_Half_PPR   Total_Half_PPR 
           FALSE            FALSE            FALSE            FALSE 

所以在这一点上,我认为我上传了没有因素的数据,通过强制它们的类型确保所有列都不是因素,并通过查看每列的类来仔细检查。我还确保没有 NA

然而,当我使用我的聚合函数时,我得到一个与因素有关的错误:

aggregate(data$Player, by = list(data$Total_Half_PPR), FUN = sum)
Error in Summary.factor(291L, na.rm = FALSE) : 
  ‘sum’ not meaningful for factors

我不知道还能做什么。任何帮助表示赞赏!

标签: rdataframefactors

解决方案


“播放器”是factor。我们需要转换为numeric

data$Player <- as.numeric(as.character(data$Player))

如果我们需要获取sum“Total_Half_PPR”,请以另一种方式进行

aggregate(data$Total_Half_PPR, by = list(data$Player), FUN = sum)

或使用公式法

aggregate(Total_Half_PPR ~ Player, data, FUN = sum)

推荐阅读