首页 > 解决方案 > 具有汇总组合的多级聚合

问题描述

假设我有以下数据

data <- 
structure(list(Age = c(">30", ">30", ">30", ">30", ">30", ">30", 
">30", ">30", "<=30", "<=30", "<=30", "<=30", "<=30", "<=30", 
"<=30", "<=30"), Experience = c("Yes", "Yes", "Yes", "Yes", "No", 
"No", "No", "No", "Yes", "Yes", "Yes", "Yes", "No", "No", "No", 
"No"), State = c("Lusiana", "Lusiana", "Lusiana", "Lusiana", 
"Lusiana", "Lusiana", "Lusiana", "Lusiana", "Lusiana", "Lusiana", 
"Lusiana", "Lusiana", "Lusiana", "Lusiana", "Lusiana", "Lusiana"
), City = c("Houma", "Houma", "Ruston", "Ruston", "Houma", "Houma", 
"Ruston", "Ruston", "Houma", "Houma", "Ruston", "Ruston", "Houma", 
"Houma", "Ruston", "Ruston"), Salary = c(1200, 1100, 1400, 1500, 
1000, 1300, 2000, 1500, 1200, 1100, 1400, 1500, 1000, 1300, 2000, 
1500)), row.names = c(NA, -16L), class = c("tbl_df", "tbl", "data.frame"
))

数据的顶部

  Age   Experience State   City   Salary
  <chr> <chr>      <chr>   <chr>   <dbl>
1 >30   Yes        Lusiana Houma    1200
2 >30   Yes        Lusiana Houma    1100
3 >30   Yes        Lusiana Ruston   1400
4 >30   Yes        Lusiana Ruston   1500
5 >30   No         Lusiana Houma    1000
6 >30   No         Lusiana Houma    1300

在这种情况下,我想按城市计算最高工资以及该州的这些值的总和,并且我想针对年龄和经验的每种组合进行计算。

预期输出是

在此处输入图像描述

我的另一个疑问

在此处输入图像描述

有没有办法在汇总中强制与汇总中的不同组合相同的级别?我的意思是将 A + B 和 B + A 视为不同的组合?

标签: rdata.table

解决方案


计算每个人的max工资和每个Citysum的工资State

library(dplyr)

data %>%
  group_by(Age, Experience, State, City) %>%
  summarise(Salary = max(Salary)) %>%
  summarise(Salary = sum(Salary), .groups = 'drop')

#  Age   Experience State   Salary
#  <chr> <chr>      <chr>    <dbl>
#1 <=30  No         Lusiana   3300
#2 <=30  Yes        Lusiana   2700
#3 >30   No         Lusiana   3300
#4 >30   Yes        Lusiana   2700

如果你想这样做data.table

library(data.table)
setDT(data)[, .(Salary = max(Salary)), .(Age, Experience, State, City)][, 
              .(Salary = sum(Salary)), .(Age, Experience, State)]

推荐阅读