首页 > 解决方案 > 重新编码引用多个向量/列的变量

问题描述

这是一个关于有效编写逻辑条件的问题。

假设如果集合中的任何列等于特定值,我想重新编码一个变量。

test <- tibble(
 CompanyA = rep(c(0:1),5),
 CompanyB = rep(c(0),10),
 CompanyC = c(1,1,1,1,0,0,1,1,1,1)
)
test

一个基本的方法是:

test$newvar <- ifelse(test$CompanyA==1 | test$CompanyB == 1 | test$CompanyC == 1,-99,0)

table(test$newvar)

但是如果我有几十列呢?我不想写出CompanyA,CompanyB等。有没有一种基本上使用%in类型语句的方法?这是一个明显错误的方法:

condition <- columns %in% c("CompanyA", "CompanyB", "CompanyC") . # obviously doesn't work

test$newvar[condition] <- 1

或者这是一种更简单的方法 - 例如,if CompanyA:CompanyC == 1, then do...

标签: rif-statement

解决方案


概述

通过从长到宽重塑 test,我能够创建一个列来测试列中的任何值是否CompanyX包含值 1。

代码

# load necessary packages ----
library(tidyverse)

# load necessary data ----
test <- 
  tibble(CompanyA = rep(c(0:1),5),
         CompanyB = rep(c(0),10),
         CompanyC = c(1,1,1,1,0,0,1,1,1,1)) %>% 
  # create an 'id' column
  mutate(id = 1:n())

# calculations -----
new.var <-
  test  %>%
  # transfrom data from long to wide
  gather(key = "company", value = "value", -id) %>%
  # for each 'id' value
  # test if any 'value' is equal to 1
  # if so, return -99; else return 0
  group_by(id) %>%
  summarize(new_var = if_else(any(value == 1), -99, 0))

# left join new.var onto test ---
test <-
  test %>%
  left_join(new.var, by = "id")

# view results ---
test
# A tibble: 10 x 5
#    CompanyA CompanyB CompanyC    id new_var
#       <int>    <dbl>    <dbl> <int>   <dbl>
#  1        0        0        1     1     -99
#  2        1        0        1     2     -99
#  3        0        0        1     3     -99
#  4        1        0        1     4     -99
#  5        0        0        0     5       0
#  6        1        0        0     6     -99
#  7        0        0        1     7     -99
#  8        1        0        1     8     -99
#  9        0        0        1     9     -99
# 10        1        0        1    10     -99

# end of script #

推荐阅读