首页 > 解决方案 > R将分组变量的向量传递给purrr :: map

问题描述

这是读取远程数据集并准备四个汇总表的代码,显示性别、教育、种族/种族和地理区域等人口统计变量中每个类别的计数:

suppressMessages(suppressWarnings(library(tidyverse)))

urlRemote_path  <- "https://raw.githubusercontent.com/"
github_path <- "DSHerzberg/WEIGHTING-DATA/master/INPUT-FILES/"
fileName_path   <- "data-input-sim.csv"

census_match_input <- suppressMessages(read_csv(url(
  str_c(urlRemote_path, github_path, fileName_path)
)))

var_order_census_match  <- c("gender", "educ", "ethnic", "region")

census_match_cat_count_gender <- census_match_input %>%
  group_by(gender) %>%
  summarize(n_census = n()) %>%
  rename(demo_cat = gender) %>%
  mutate(demo_var = "gender") %>%
  relocate(demo_var, .before = demo_cat)

census_match_cat_count_educ <- census_match_input %>%
  group_by(educ) %>%
  summarize(n_census = n()) %>%
  rename(demo_cat = educ) %>%
  mutate(demo_var = "educ") %>%
  relocate(demo_var, .before = demo_cat)

census_match_cat_count_ethnic <- census_match_input %>%
  group_by(ethnic) %>%
  summarize(n_census = n()) %>%
  rename(demo_cat = ethnic) %>%
  mutate(demo_var = "ethnic") %>%
  relocate(demo_var, .before = demo_cat)

census_match_cat_count_region <- census_match_input %>%
  group_by(region) %>%
  summarize(n_census = n()) %>%
  rename(demo_cat = region) %>%
  mutate(demo_var = "region") %>%
  relocate(demo_var, .before = demo_cat)

我想使用purrr::map(). 我的想法是遍历变量名的向量,如下所示:

census_match_cat_count <- var_order_census_match %>% 
  map(~
        census_match_input %>%
        group_by(!!.x) %>%
        summarize(n_census = n()))

这不会返回所需的输出;相反,它返回的表格缺少每个人口统计变量下的类别的单独行和计数。

此外,当我尝试扩展映射函数以包含其余代码时,如下所示:

census_match_cat_count <- var_order_census_match %>%
  map(
    ~
      census_match_input %>%
      group_by(!!.x) %>%
      summarize(n_census = n()) %>%
      rename(demo_cat = !!.x) %>%
      mutate(demo_var = .x) %>%
      relocate(demo_var, .before = demo_cat)
  )

我收到错误提示我没有使用正确的tidyeval程序。

Stack Overflow 中有相关主题,但似乎没有一个主题能解决我关于如何传递变量名以供dplyr::group_by()within使用的特定问题purrr::map()

提前感谢您的帮助。

标签: rdplyrpurrr

解决方案


您快到了,但您需要将变量名称转换为与 . 一起使用的符号group_by()。请注意,在下面的代码中count()group_by()+的快捷方式summarise(n = n())

library(dplyr)
library(purrr)

vars <- c("gender", "educ", "ethnic", "region")

vars %>%
  map(~ census_match_input %>%
         count(!!sym(.x)) %>%
         rename(demo_cat = !!.x) %>%
         mutate(demo_var = .x) %>%
         relocate(demo_var))

[[1]]
# A tibble: 2 x 3
  demo_var demo_cat     n
  <chr>    <chr>    <int>
1 gender   female     524
2 gender   male       476

[[2]]
# A tibble: 4 x 3
  demo_var demo_cat         n
  <chr>    <chr>        <int>
1 educ     BA_plus        311
2 educ     HS_grad        247
3 educ     no_HS          133
4 educ     some_college   309

[[3]]
# A tibble: 5 x 3
  demo_var demo_cat     n
  <chr>    <chr>    <int>
1 ethnic   asian       48
2 ethnic   black      146
3 ethnic   hispanic   252
4 ethnic   other       64
5 ethnic   white      490

[[4]]
# A tibble: 4 x 3
  demo_var demo_cat      n
  <chr>    <chr>     <int>
1 region   midwest     218
2 region   northeast   173
3 region   south       367
4 region   west        242

推荐阅读