首页 > 解决方案 > How to subtract two columns using tidyverse mutate with columns named by external variables

问题描述

I’d like to dynamically assign which columns to subtract from each other. I’ve read around and looks like I need to use all_of, and maybe across (How to subtract one column from multiple columns in a dataframe in R using dplyr, How to you use objects in dplyr filter?). I can get it working for one variable in a mutate phrase (e.g. mutate(y = all_of(x))), but I can’t seem to do even simple calculations using two. Here’s a simplified example of what I want to do:

var1 <- c("Sepal.Length")
var2 <- c("Sepal.Width")

result <- iris %>%
  mutate(calculation = all_of(var1) - all_of(var2))

标签: rtidyversedplyracross

解决方案


We may use .data to subset the column as a vector. The all_of/any_of are used along with across to loop across the columns

library(dplyr)
iris %>%
  mutate(calculation = .data[[var1]] - .data[[var2]])%>%
  head

-output

Sepal.Length Sepal.Width Petal.Length Petal.Width Species calculation
1          5.1         3.5          1.4         0.2  setosa         1.6
2          4.9         3.0          1.4         0.2  setosa         1.9
3          4.7         3.2          1.3         0.2  setosa         1.5
4          4.6         3.1          1.5         0.2  setosa         1.5
5          5.0         3.6          1.4         0.2  setosa         1.4
6          5.4         3.9          1.7         0.4  setosa         1.5

Or may also use cur_data()

iris %>%
    head %>%
    mutate(calculation = cur_data()[[var1]] - cur_data()[[var2]])

-output

 Sepal.Length Sepal.Width Petal.Length Petal.Width Species calculation
1          5.1         3.5          1.4         0.2  setosa         1.6
2          4.9         3.0          1.4         0.2  setosa         1.9
3          4.7         3.2          1.3         0.2  setosa         1.5
4          4.6         3.1          1.5         0.2  setosa         1.5
5          5.0         3.6          1.4         0.2  setosa         1.4
6          5.4         3.9          1.7         0.4  setosa         1.5

Or another option is to pass both the variables in across, and then reduce with -

library(purrr)
iris %>%
   head %>%
   mutate(calculation = reduce(across(all_of(c(var1, var2))), `-`))

-output

Sepal.Length Sepal.Width Petal.Length Petal.Width Species calculation
1          5.1         3.5          1.4         0.2  setosa         1.6
2          4.9         3.0          1.4         0.2  setosa         1.9
3          4.7         3.2          1.3         0.2  setosa         1.5
4          4.6         3.1          1.5         0.2  setosa         1.5
5          5.0         3.6          1.4         0.2  setosa         1.4
6          5.4         3.9          1.7         0.4  setosa         1.5

Or could convert to symbol and evaluate (!!)

iris %>% 
   head %>% 
   mutate(calculation = !! rlang::sym(var1) - !! rlang::sym(var2))
  Sepal.Length Sepal.Width Petal.Length Petal.Width Species calculation
1          5.1         3.5          1.4         0.2  setosa         1.6
2          4.9         3.0          1.4         0.2  setosa         1.9
3          4.7         3.2          1.3         0.2  setosa         1.5
4          4.6         3.1          1.5         0.2  setosa         1.5
5          5.0         3.6          1.4         0.2  setosa         1.4
6          5.4         3.9          1.7         0.4  setosa         1.5

Or if we want to use all_of in across, just subset the column with [[

iris %>% 
   head %>% 
   mutate(calculation = across(all_of(var1))[[1]] - 
                         across(all_of(var2))[[1]])
Sepal.Length Sepal.Width Petal.Length Petal.Width Species calculation
1          5.1         3.5          1.4         0.2  setosa         1.6
2          4.9         3.0          1.4         0.2  setosa         1.9
3          4.7         3.2          1.3         0.2  setosa         1.5
4          4.6         3.1          1.5         0.2  setosa         1.5
5          5.0         3.6          1.4         0.2  setosa         1.4
6          5.4         3.9          1.7         0.4  setosa         1.5

The reason we need to subset is because, across by default will update the original column when the .names is not present. The calculation will be a data.frame with a single column

out <- iris %>%
    head %>% 
    mutate(calculation = across(all_of(var1)) -
           across(all_of(var2)))
out
  Sepal.Length Sepal.Width Petal.Length Petal.Width Species Sepal.Length
1          5.1         3.5          1.4         0.2  setosa          1.6
2          4.9         3.0          1.4         0.2  setosa          1.9
3          4.7         3.2          1.3         0.2  setosa          1.5
4          4.6         3.1          1.5         0.2  setosa          1.5
5          5.0         3.6          1.4         0.2  setosa          1.4
6          5.4         3.9          1.7         0.4  setosa          1.5

str(out)
data.frame':    6 obs. of  6 variables:
 $ Sepal.Length: num  5.1 4.9 4.7 4.6 5 5.4
 $ Sepal.Width : num  3.5 3 3.2 3.1 3.6 3.9
 $ Petal.Length: num  1.4 1.4 1.3 1.5 1.4 1.7
 $ Petal.Width : num  0.2 0.2 0.2 0.2 0.2 0.4
 $ Species     : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1
 $ calculation :'data.frame':   6 obs. of  1 variable:
  ..$ Sepal.Length: num  1.6 1.9 1.5 1.5 1.4 1.5

推荐阅读