首页 > 解决方案 > 如何使用共享相同级别的各种列并使用特定列值创建虚拟变量

问题描述

我正在尝试获取下表的虚拟变量:

df1 <- structure(list(Value1 = c(9.330154398, 32.43881489, 54.77178387, 54.77178387),
                      Value2 = c(1, 2, 3, 8),
                      var1 = c("HomeATL", "AwaySDN", "AwayLAN", "AwayLAN"),
                      var2 = c("AwayHOU", "HomeATL", "HomeATL", "HomeATL"),
                      var3 = c("HomeEast", "HomeWest", "AwayEast", "AwayWest"),
                      var3values = c(1,2,3,4),
                      var4 = c("AwayWest", "AwayWest", "HomeSame", "HomeEast"),
                      var4values = c(5,6,7,8)), 
                 class = "data.frame", row.names = c(NA,-4L))

结果应如下所示:

Value1         Value2   HomeEast    HomeWest    AwayEast    AwayWest    HomeSame    HomeATL AwayHOU AwaySDN AwayLAN
9.330154398        1    1   0   0   5   0   1   1   0   0
-32.43881489       2    0   2   0   6   0   1   0   1   0
54.77178387        3    0   0   3   0   7   1   0   0   1
54.77178387        8    8   0   0   4   0   1   0   0   1

我已经问过类似的问题,我使用的方法是:

library(tidyverse)
rownames_to_column(df1, 'rn') %>%
    gather(key, val, var1:var4) %>% 
    count(rn, val) %>%
    spread(val, n, fill = 0)  %>%
    select(-rn) %>%
    bind_cols(df1[1:2], .)

但是,它返回 1 或 0 的虚拟值,而不是某些预定义列的值。

我该怎么做?

标签: rlinear-regressiondummy-variable

解决方案


一个选项是gathermatches名称以“var”开头,后跟一个或多个数字(\\d+)直到$字符串的结尾(),按行号分组,“val”列,基于创建“n” ie中指定的一个条件,case_when如果'key'是'var3',则获取'var3values'的对应值,或者如果是'var4',则获取'var4values',如果两者都不是,则获取频率计数(n()),spread它为“宽”格式并仅保留感兴趣的列

 rownames_to_column(df1, 'rn') %>%
    gather(key, val, matches("^var\\d+$")) %>%
    group_by(rn, val) %>% 
    mutate(n = case_when(key == "var3" ~ var3values, 
                         key == "var4" ~ var4values,
                         TRUE ~ as.numeric(n()))) %>% 
    select(-var3values, -var4values, -key) %>%
    spread(val, n, fill = 0) %>%  
    ungroup %>% 
    select(-rn)
# A tibble: 4 x 11
#  Value1 Value2 AwayEast AwayHOU AwayLAN AwaySDN AwayWest HomeATL HomeEast HomeSame HomeWest
#   <dbl>  <dbl>    <dbl>   <dbl>   <dbl>   <dbl>    <dbl>   <dbl>    <dbl>    <dbl>    <dbl>
#1   9.33      1        0       1       0       0        5       1        1        0        0
#2  32.4       2        0       0       0       1        6       1        0        0        2
#3  54.8       3        3       0       1       0        0       1        0        7        0
#4  54.8       8        0       0       1       0        4       1        8        0        0

推荐阅读