首页 > 解决方案 > Creating variables from list objects in R

问题描述

I'm trying to create a binary set of variables that uses data across multiple columns. I have a dataset where I'm trying to create a binary variable where any column with a specific name will be indexed for a certain value. I'll use iris as an example dataset.

Let's say I want to create a variable where any column with the string "Sepal" and any row in those columns with the values of 5.1, 3.0, and 4.7 will become "Class A" while values with 3.1, 5.0, and 5.4 will be "Class B". So let's look at the first few entries of iris

  Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1          5.1         3.5          1.4         0.2  setosa
2          4.9         3.0          1.4         0.2  setosa
3          4.7         3.2          1.3         0.2  setosa
4          4.6         3.1          1.5         0.2  setosa
5          5.0         3.6          1.4         0.2  setosa
6          5.4         3.9          1.7         0.4  setosa

The first 3 rows should then be under "Class A" While rows 4-6 will be under "Class B". I tried writing this code to do that

mutate(iris, Class = if_else(
  vars(contains("Sepal")), any_vars(. %in% c(5.1,3.0, 4.7))), "Class A",
  ifelse(vars(contains("Sepal")), any_vars(. %in% c(3.1,    5.0,    5.4))), "Class B",NA)

and received the error

Error: `condition` must be a logical vector, not a `quosures/list` object

So I've realized I need lapply here, but I'm not even sure where to begin to write this because I'm not sure how to represent the entire part of selecting columns with "Sepal" in the name and also include the specific values in those rows as one list object to provide to lapply

This is clearly the wrong syntax

lapply(vars(contains("Sepal")), any_vars(. %in% c(5.1,3.0, 4.7)))

Examples using case_when will also be accepted as answers.

标签: rdplyr

解决方案


If you want to do this using dplyr, you can use rowwise with new c_across :

library(dplyr)

iris %>%
  rowwise() %>%
  mutate(Class = case_when(
      any(c_across(contains("Sepal")) %in% c(5.1,3.0, 4.7)) ~ 'Class A', 
      any(c_across(contains("Sepal")) %in% c(3.1,5.0,5.4)) ~ 'Class B')) %>%
  head


# Sepal.Length Sepal.Width Petal.Length Petal.Width Species Class  
#         <dbl>       <dbl>        <dbl>       <dbl> <fct>   <chr>  
#1          5.1         3.5          1.4         0.2 setosa  Class A
#2          4.9         3            1.4         0.2 setosa  Class A
#3          4.7         3.2          1.3         0.2 setosa  Class A
#4          4.6         3.1          1.5         0.2 setosa  Class B
#5          5           3.6          1.4         0.2 setosa  Class B
#6          5.4         3.9          1.7         0.4 setosa  Class B

However, note that using %in% on numerical values is not accurate. If interested you may read Why are these numbers not equal?


推荐阅读