首页 > 解决方案 > R - dplyr - 运行许多非常相似的查询的代码......?

问题描述

考虑以下:

df <- data.frame(
  Name = c("Alan", "Bob", "Christine", "David", "Erica"),
  Gender = c("M", "M", "F", "M", "F"),
  Star_Sign = c("Aquarius", "Capricorn", "Aquarius", "Libra", "Leo"),
  City = c("London", "Paris", "Berlin", "London", "Paris"),
  Blood_Group = c("A", "AB", "B", "O", "A"),
  Hours_Worked = c(2000, 1600, 0, 100, 200),
  Salary = c(100000, 20000, 0, 500, 4000)
)

Name_Summary <-         df %>% group_by(Name)        %>% summarise(Hours_Worked = sum(Hours_Worked), Average_Salary = mean(Salary))
Gender_Summary <-       df %>% group_by(Gender)      %>% summarise(Hours_Worked = sum(Hours_Worked), Average_Salary = mean(Salary))
Star_Sign_Summary <-    df %>% group_by(Star_Sign)   %>% summarise(Hours_Worked = sum(Hours_Worked), Average_Salary = mean(Salary))
City_Summary <-         df %>% group_by(City)        %>% summarise(Hours_Worked = sum(Hours_Worked), Average_Salary = mean(Salary))
Blood_Group_Summary <-  df %>% group_by(Blood_Group) %>% summarise(Hours_Worked = sum(Hours_Worked), Average_Salary = mean(Salary))

显然,这适用于少数领域。但是,如果我有 100 个不同的领域(比如说)来做这件事,它就会变得非常笨拙。

我想有一种方法可以遍历字段列表并为每个字段生成这些摘要,使用一些代码来生成(并命名摘要),但我认为我不知道该怎么做. 有人可以帮忙吗?

谢谢艾伦

标签: rdplyr

解决方案


如果您有一个要作为字符向量分组的列的列表:

vars_to_group_by <- names(df)[1:5]

您可以遍历它们(我正在使用purrr::map(),但您可以使用lapply()or 循环),并使用此rlang模式转换字符串 >> 符号 >> 正确评估的变量。

library(tidyverse)

map(vars_to_group_by, sym) %>% 
  map(~ df %>% 
        group_by(!!.x) %>% 
        summarise(avg_salary = mean(Salary),
                  avg_hours = mean(Hours_Worked),
                  avg_hourly_wage = avg_salary / avg_hours))

你会得到一个未命名的列表,因为进入的向量是未命名的。

[[1]]
# A tibble: 5 x 4
  Name      avg_salary avg_hours avg_hourly_wage
  <fct>          <dbl>     <dbl>           <dbl>
1 Alan          100000      2000            50  
2 Bob            20000      1600            12.5
3 Christine          0         0           NaN  
4 David            500       100             5  
5 Erica           4000       200            20  

[[2]]
# A tibble: 2 x 4
  Gender avg_salary avg_hours avg_hourly_wage
  <fct>       <dbl>     <dbl>           <dbl>
1 F           2000       100             20  
2 M          40167.     1233.            32.6

[[3]]
# A tibble: 4 x 4
  Star_Sign avg_salary avg_hours avg_hourly_wage
  <fct>          <dbl>     <dbl>           <dbl>
1 Aquarius       50000      1000            50  
2 Capricorn      20000      1600            12.5
3 Leo             4000       200            20  
4 Libra            500       100             5  

[[4]]
# A tibble: 3 x 4
  City   avg_salary avg_hours avg_hourly_wage
  <fct>       <dbl>     <dbl>           <dbl>
1 Berlin          0         0           NaN  
2 London      50250      1050            47.9
3 Paris       12000       900            13.3

[[5]]
# A tibble: 4 x 4
  Blood_Group avg_salary avg_hours avg_hourly_wage
  <fct>            <dbl>     <dbl>           <dbl>
1 A                52000      1100            47.3
2 AB               20000      1600            12.5
3 B                    0         0           NaN  
4 O                  500       100             5  

您可以根据呼叫vars_to_group_by之前或之后添加名称。map()


推荐阅读