首页 > 解决方案 > 如何使用准引用以编程方式重新编码变量?

问题描述

我有以下数据集并想重新编码变量

library(tidyverse)
library(rlang)
mytib <- tribble(~colA, ~colB, ~colC,
        "good", "bad", "better",
        "better", "bad", "worse",
        "good", "best", "good")

在我的数据集中,我有更多的列,所以我正在寻找一种编程方法来重新编码数据集,以便“坏”和“更糟”被折叠成“可怕”和“好”、“更好”、“最好”被折叠变成“了不起”。所有这些都应该编码到新列中,每个变量对应一个列,例如“colA_bin”(用于二进制)、“colB_bin”和“colC_bin”。由于我有很多列,我想使用这些dplyr::select(starts_with(...) & ends_with(...))函数来做到这一点。

我想出的是以下内容:

attractiveness_vars <- mytib %>%
                             dplyr::select(starts_with(c("col")) & ends_with(c("A", "B", "C")) %>%
                             names(.)
attractiveness_lvls_neg <- c("bad", "worse")
attractiveness_lvls_pos <- c("good", "better", "best")
attractiveness_lvls_new <- c("terrible", "awesome")

recode_attractiveness <- function(dataframe, column_name, lvls_neg, lvls_pos, lvls_new){
    new_col <- dataframe %>%
    mutate({{column_name}} := factor(case_when({{column_name}} %in% 
                                                                 lvls_neg ~ lvls_new[1],
                                                               {{column_name}} %in%
                                                                 lvls_pos ~ lvls_new[2],
                                                               TRUE ~ NA_character_),
                                                     levels = lvls_new)) %>%
    pull({{column_name}})
    return(new_col)
}

当我跑

recode_attractiveness(mytib, attractiveness_vars, attractiveness_lvls_neg, attractiveness_lvls_pos, attractiveness_lvls_new)

我收到一个错误ℹ Input `attractiveness_vars` must be size [NROW] or 1, not [length(attractiveness_vars)].注意,它实际上告诉我数字,我只是想让它更易于阅读。

可能有一种更简单的方法来解决这个问题。我很想知道是否有一种 quasiquotation 方法来解决这个问题,或者(无论是否存在)一个优雅的程序解决方案,即一个不涉及我输入 case_when(...)代码。

预期的输出应该是这样的

colA   colA_bin  colB  colB_bin   colC     colC_bin
"good" "awesome" "bad" "terrible" "better" "awesome"
...

标签: rdplyrrlang

解决方案


也许一起跳过函数定义并使用across

library(dplyr) # Version >= 1.0.0
mytib %>% 
  mutate(across(one_of(attractiveness_vars),
                ~ factor(case_when(. %in% attractiveness_lvls_neg ~ attractiveness_lvls_new[1],
                                   . %in% attractiveness_lvls_pos ~ attractiveness_lvls_new[2],
                                   TRUE ~ NA_character_),
                         levels = attractiveness_lvls_new),
                .names = "{col}_bin"))
# A tibble: 3 x 6
  colA   colB  colC   colA_bin colB_bin colC_bin
  <chr>  <chr> <chr>  <fct>    <fct>    <fct>   
1 good   bad   better awesome  terrible awesome 
2 better bad   worse  awesome  terrible terrible
3 good   best  good   awesome  awesome  awesome 

对于奖励积分,您可以使用forcats::fct_collapse

library(forcats)
attractiveness_factors <- setNames(list(attractiveness_lvls_neg, attractiveness_lvls_pos),
                                   attractiveness_lvls_new)
attractiveness_factors
#$terrible
#[1] "bad"   "worse"
#$awesome
#[1] "good"   "better" "best"  

mytib %>% 
  mutate(across(one_of(attractiveness_vars),
                ~ fct_collapse(.,!!!attractiveness_factors),
                .names = "{col}_bin"))

推荐阅读