r - 如何使用 map* 和 mutate 将列表转换为一组附加列?
问题描述
几天来,我可能已经尝试了数百种该代码的排列方式,试图获得一个能够满足我需求的函数,但我最终放弃了。感觉这绝对应该是可行的,我是如此接近!
我试图用下面的代表回到这里的核心问题。
基本上我有一个单行数据框,其中包含一个字符串列表(“概念”)。我想为这些字符串中的每一个创建一个附加列mutate
,理想情况下使用从字符串中获取名称的列,然后用函数调用的结果填充列(?不管哪个函数,对于现在?-我只需要该功能的基础结构即可工作。)
像往常一样,我觉得我必须遗漏一些明显的东西......也许只是一个语法错误。我还想知道是否需要使用purrr::map
,也许更简单的矢量化映射可以正常工作。
我觉得新列被命名..1
而不是概念名称这一事实是关于什么是错误的一点线索。
我可以通过手动调用每个概念来创建我想要的数据框(请参阅 reprex 的结尾),但由于不同数据框的概念列表不同,我想使用管道和 tidyverse 技术来实现它,而不是手动完成。
我已阅读以下问题以寻求帮助:
- 如何使用 purrr 中的 map 和 dplyr::mutate 根据列对创建多个新列
- 如何使用 purrr:map 函数改变具有动态变量的多个列?
- (R) 更简洁地使用带有列表列的 map()
- 使用 purrr 和预定义函数添加多个输出变量
- 用 purrr 创建新变量(怎么做?)
- 如何使用动态名称计算 R 数据框中的多个新列
但这些都没有帮助我解决我遇到的问题。[编辑:在最后一个 q 中添加到该列表中,这可能是我需要的技术]。
<!-- language-all: lang-r -->
# load packages -----------------------------------------------------------
library(rlang)
library(dplyr)
library(tidyr)
library(magrittr)
library(purrr)
library(nomisr)
# set up initial list of tibbles ------------------------------------------
df <- list(
district_population = tibble(
dataset_title = "Population estimates - local authority based by single year",
dataset_id = "NM_2002_1"
),
jsa_claimants = tibble(
dataset_title = "Jobseeker\'s Allowance with rates and proportions",
dataset_id = "NM_1_1"
)
)
# just use the first tibble for now, for testing --------------------------
# ideally I want to map across dfs through a list -------------------------
df <- df[[1]]
# nitty gritty functions --------------------------------------------------
get_concept_list <- function(df) {
dataset_id <- pluck(df, "dataset_id")
nomis_overview(id = dataset_id,
select = c("dimensions", "codes")) %>%
pluck("value", 1, "dimension") %>%
filter(!concept == "geography") %>%
pull("concept")
}
# get_concept_list() returns the strings I need:
get_concept_list(df)
#> [1] "time" "gender" "c_age" "measures"
# Here is a list of examples of types of map* that do various things,
# none of which is what I need it to do
# I'm using toupper() here for simplicity - ultimately I will use
# get_concept_info() to populate the new columns
# this creates four new tibbles
get_concept_list(df) %>%
map(~ mutate(df, {{.x}} := toupper(.x)))
#> [[1]]
#> # A tibble: 1 x 3
#> dataset_title dataset_id ..1
#> <chr> <chr> <chr>
#> 1 Population estimates - local authority based by single year NM_2002_1 TIME
#>
#> [[2]]
#> # A tibble: 1 x 3
#> dataset_title dataset_id ..1
#> <chr> <chr> <chr>
#> 1 Population estimates - local authority based by single year NM_2002_1 GENDER
#>
#> [[3]]
#> # A tibble: 1 x 3
#> dataset_title dataset_id ..1
#> <chr> <chr> <chr>
#> 1 Population estimates - local authority based by single year NM_2002_1 C_AGE
#>
#> [[4]]
#> # A tibble: 1 x 3
#> dataset_title dataset_id ..1
#> <chr> <chr> <chr>
#> 1 Population estimates - local authority based by single year NM_2002_1 MEASUR~
# this throws an error
get_concept_list(df) %>%
map_chr(~ mutate(df, {{.x}} := toupper(.x)))
#> Error: Result 1 must be a single string, not a vector of class `tbl_df/tbl/data.frame` and of length 3
# this creates three extra rows in the tibble
get_concept_list(df) %>%
map_df(~ mutate(df, {{.x}} := toupper(.x)))
#> # A tibble: 4 x 3
#> dataset_title dataset_id ..1
#> <chr> <chr> <chr>
#> 1 Population estimates - local authority based by single year NM_2002_1 TIME
#> 2 Population estimates - local authority based by single year NM_2002_1 GENDER
#> 3 Population estimates - local authority based by single year NM_2002_1 C_AGE
#> 4 Population estimates - local authority based by single year NM_2002_1 MEASUR~
# this does the same as map_df
get_concept_list(df) %>%
map_dfr(~ mutate(df, {{.x}} := toupper(.x)))
#> # A tibble: 4 x 3
#> dataset_title dataset_id ..1
#> <chr> <chr> <chr>
#> 1 Population estimates - local authority based by single year NM_2002_1 TIME
#> 2 Population estimates - local authority based by single year NM_2002_1 GENDER
#> 3 Population estimates - local authority based by single year NM_2002_1 C_AGE
#> 4 Population estimates - local authority based by single year NM_2002_1 MEASUR~
# this creates a single tibble 12 columns wide
get_concept_list(df) %>%
map_dfc(~ mutate(df, {{.x}} := toupper(.x)))
#> # A tibble: 1 x 12
#> dataset_title dataset_id ..1 dataset_title1 dataset_id1 ..11 dataset_title2
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 Population e~ NM_2002_1 TIME Population es~ NM_2002_1 GEND~ Population es~
#> # ... with 5 more variables: dataset_id2 <chr>, ..12 <chr>,
#> # dataset_title3 <chr>, dataset_id3 <chr>, ..13 <chr>
# function to get info on each concept (except geography) -----------------
# this is the function I want to use eventually to populate my new columns
get_concept_info <- function(df, concept_name) {
dataset_id <- pluck(df, "dataset_id")
nomis_overview(id = dataset_id) %>%
filter(name == "dimensions") %>%
pluck("value", 1, "dimension") %>%
filter(concept == concept_name) %>%
pluck("codes.code", 1) %>%
select(name, value) %>%
nest(data = everything()) %>%
as.list() %>%
pluck("data")
}
# individual mutate works, for comparison ---------------------------------
# I can create the kind of table I want manually using a line like the one below
# df %>% map(~ mutate(., measures = get_concept_info(., concept_name = "measures")))
df %>% mutate(., measures = get_concept_info(df, "measures"))
#> # A tibble: 1 x 3
#> dataset_title dataset_id measures
#> <chr> <chr> <list>
#> 1 Population estimates - local authority based by sin~ NM_2002_1 <tibble [2 x ~
<sup>Created on 2020-02-10 by the [reprex package](https://reprex.tidyverse.org) (v0.3.0)</sup>
解决方案
使用!!
and:=
让您动态命名列。然后,我们可以使用数据集标题和 ID 列来减少列表输出map()
,reduce()
其中 left_joins() 列表中的所有数据帧。
df_2 <-
map(get_concept_list(df),
~ mutate(df,
!!.x := get_concept_info(df, .x))) %>%
reduce(left_join, by = c("dataset_title", "dataset_id"))
df_2
# A tibble: 1 x 6
dataset_title dataset_id time gender c_age measures
<chr> <chr> <list<df[,2]>> <list<df[,2]>> <list<df[,2]>> <list<df[,2]>>
1 Population estimates - local authority based by single year NM_2002_1 [28 x 2] [3 x 2] [121 x 2] [2 x 2]
推荐阅读
- python - 滑动寡妇的每个子集数据的平均或总和
- mysql - 使用存储过程对事务进行 Sequelize
- firebase - 在 Cloud Run 上运行 firebase cli
- java - 从字符串中添加单个数字
- flutter - 从变量初始化飞镖类
- node.js - 如何使用 Playwright 或 Puppeteer 登录 google 帐户?
- python - 为什么 numpy.exp() 在这种情况下会给出溢出警告?
- mysql - 使用 Ruby on rails Elastic Beanstalk 将 RDS 迁移到 Aurora MySQL
- spring - 在春季安全中实现每组授权的最佳方法
- dart - 在 Dart 中四舍五入到 10 分