首页 > 解决方案 > 基于作为字符串的第一列在 R 中聚合数据框

问题描述

我有一个这样的示例数据框。我想根据第一列的各个角色聚合每个列(总和)值。我已经尝试了一段时间,但是当第一列是字符串时,我不知道如何处理它。如果我能得到总和的最后一行,那将是一个加号。但是,如果我能得到每个角色的合计值,我已经很高兴了。我该怎么做?谢谢!

东风:

 profession      apple    banana   grape   pear
  teacher          1       0         1       0
  student          0       1         0       1
  student          1       0         1       1 
  journalist       1       1         0       1
  teacher          0       0         0       1
  bus driver       1       0         0       0
  journalist       1       0         1       1
  bus driver       0       0         0       1
  teacher          1       0         0       1

输出:

  profession      apple    banana   grape   pear

   teacher          2       0         1       2
   student          1       1         1       2
   journalist       2       1         2       2
   bus driver       1       0         0       1
   sum              6       2         4       7 
  

  

标签: rdataframe

解决方案


一个选项是sum按“专业”进行分组,然后添加一行colSums

library(dplyr)
library(tibble)
df %>%
     group_by(profession) %>% 
     summarise(across(where(is.numeric), sum), .groups = 'drop') %>% 
     add_row(profession = 'sum', !!! colSums(.[-1]))

-输出

# A tibble: 5 x 5
#  profession apple banana grape  pear
#  <chr>      <dbl>  <dbl> <dbl> <dbl>
#1 bus driver     1      0     0     1
#2 journalist     2      1     1     2
#3 student        1      1     1     2
#4 teacher        2      0     1     2
#5 sum            6      2     3     7

或使用adorn_totals来自janitor

library(janitor)
df %>%
      group_by(profession) %>% 
      summarise(across(where(is.numeric), sum), .groups = 'drop')  %>% 
      adorn_totals()
#  profession apple banana grape pear
#  bus driver     1      0     0    1
#  journalist     2      1     1    2
#    student     1      1     1    2
#    teacher     2      0     1    2
#    Total     6      2     3    7

数据

df <- structure(list(profession = c("teacher", "student", "student", 
"journalist", "teacher", "bus driver", "journalist", "bus driver", 
"teacher"), apple = c(1L, 0L, 1L, 1L, 0L, 1L, 1L, 0L, 1L), banana = c(0L, 
1L, 0L, 1L, 0L, 0L, 0L, 0L, 0L), grape = c(1L, 0L, 1L, 0L, 0L, 
0L, 1L, 0L, 0L), pear = c(0L, 1L, 1L, 1L, 1L, 0L, 1L, 1L, 1L)), 
class = "data.frame", row.names = c(NA, 
-9L))

推荐阅读