首页 > 解决方案 > `dplyr::_join` 函数的命名向量“by”参数

问题描述

我正在向dplyr::_join两个数据帧by不同的列写入一个函数,第一个数据帧的列名动态指定为函数参数。我相信我需要使用rlang准引用/元编程,但无法获得有效的解决方案。我很感激任何建议!

library(dplyr)
library(rlang)
library(palmerpenguins)

# Create a smaller dataset
penguins <-
  penguins %>% 
  group_by(species) %>% 
  slice_head(n = 4) %>% 
  ungroup()

# Create a colors dataset
penguin_colors <-
  tibble(
    type = c("Adelie", "Chinstrap", "Gentoo"),
    color = c("orange", "purple", "green")
  )


# Without function --------------------------------------------------------

# Join works with character vectors
left_join(
  penguins, penguin_colors, by = c("species" = "type")
)
#> # A tibble: 12 x 9
#>    species island bill_length_mm bill_depth_mm flipper_length_… body_mass_g
#>    <chr>   <fct>           <dbl>         <dbl>            <int>       <int>
#>  1 Adelie  Torge…           39.1          18.7              181        3750
#>  2 Adelie  Torge…           39.5          17.4              186        3800
#>  3 Adelie  Torge…           40.3          18                195        3250
#>  4 Adelie  Torge…           NA            NA                 NA          NA
#>  5 Chinst… Dream            46.5          17.9              192        3500
#>  6 Chinst… Dream            50            19.5              196        3900
#>  7 Chinst… Dream            51.3          19.2              193        3650
#>  8 Chinst… Dream            45.4          18.7              188        3525
#>  9 Gentoo  Biscoe           46.1          13.2              211        4500
#> 10 Gentoo  Biscoe           50            16.3              230        5700
#> 11 Gentoo  Biscoe           48.7          14.1              210        4450
#> 12 Gentoo  Biscoe           50            15.2              218        5700
#> # … with 3 more variables: sex <fct>, year <int>, color <chr>

# Join works with data-variable and character vector
left_join(
  penguins, penguin_colors, by = c(species = "type")
)
#> # A tibble: 12 x 9
#>    species island bill_length_mm bill_depth_mm flipper_length_… body_mass_g
#>    <chr>   <fct>           <dbl>         <dbl>            <int>       <int>
#>  1 Adelie  Torge…           39.1          18.7              181        3750
#>  2 Adelie  Torge…           39.5          17.4              186        3800
#>  3 Adelie  Torge…           40.3          18                195        3250
#>  4 Adelie  Torge…           NA            NA                 NA          NA
#>  5 Chinst… Dream            46.5          17.9              192        3500
#>  6 Chinst… Dream            50            19.5              196        3900
#>  7 Chinst… Dream            51.3          19.2              193        3650
#>  8 Chinst… Dream            45.4          18.7              188        3525
#>  9 Gentoo  Biscoe           46.1          13.2              211        4500
#> 10 Gentoo  Biscoe           50            16.3              230        5700
#> 11 Gentoo  Biscoe           48.7          14.1              210        4450
#> 12 Gentoo  Biscoe           50            15.2              218        5700
#> # … with 3 more variables: sex <fct>, year <int>, color <chr>

# Join does NOT work with character vector and data-variable
left_join(
  penguins, penguin_colors, by = c(species = type)
)
#> Error in standardise_join_by(by, x_names = x_names, y_names = y_names): object 'type' not found



# With function -----------------------------------------------------------

# Version 1: Without tunneling
add_colors <- function(data, var) {
  left_join(
    data, penguin_colors, by = c(var = "type")
  )
}

add_colors(penguins, species)
#> Error: Join columns must be present in data.
#> x Problem with `var`.
add_colors(penguins, "species")
#> Error: Join columns must be present in data.
#> x Problem with `var`.

# Version 2: With tunneling
add_colors <- function(data, var) {
  left_join(
    data, penguin_colors, by = c("{{var}}" = "type")
  )
}

add_colors(penguins, species)
#> Error: Join columns must be present in data.
#> x Problem with `{{var}}`.
add_colors(penguins, "species")
#> Error: Join columns must be present in data.
#> x Problem with `{{var}}`.

# Version 2: With tunneling and glue syntax
add_colors <- function(data, var) {
  left_join(
    data, penguin_colors, by = c("{{var}}" := "type")
  )
}

add_colors(penguins, species)
#> Error: `:=` can only be used within a quasiquoted argument
add_colors(penguins, "species")
#> Error: `:=` can only be used within a quasiquoted argument

reprex 包于 2020-10-05 创建(v0.3.0)

以下是我查阅的相关资源:

感谢您的意见。

标签: rdplyrmetaprogrammingrlangquasiquotes

解决方案


library(dplyr)
left_join(
  penguins, penguin_colors, by = c(species = "type")
)

上述工作的原因是因为by我们正在创建一个这样的命名向量:

c(species = "type")
#species 
# "type"

您也可以通过以下方式做到这一点setNames

setNames('type', 'species')

但请注意,species不带引号传递失败。

setNames('type', species)

setNames(“type”,species)中的错误:找不到对象“species”

所以创建一个命名向量setNames并在函数中传递字符值。

add_colors <- function(data, var) {
  left_join(
    data, penguin_colors, by = setNames('type', var)
  )
}

add_colors(penguins, 'species')

推荐阅读