首页 > 解决方案 > 如何使用共享相同级别的各种列创建虚拟变量

问题描述

我正在尝试获取下表的虚拟变量:

df:

Value1       var1       var2      var3      var4
9.330154398  HomeATL    AwayHOU   HomeEast  AwayWest
32.43881489  AwaySDN    HomeATL   HomeWest  AwayWest
54.77178387  AwayLAN    HomeATL   AwayEast  HomeSame
54.77178387  AwayLAN    HomeATL   AwayWest  HomeEast

该列var1var2共享同一级别。另一方面,柱子var3var4鲸鱼的等级也是如此。因此,我需要在创建虚拟变量的过程中,创建的新列不应该有重复的级别。我的意思是,在 var3 和 var4 的示例中,对于第 1 行和第 3 行,两者都有AwayWest,所以我只需要AwayWest在每行上填充 1 列命名为数字 1 。

我想要的输出是:

Value1  HomeEast    HomeWest    AwayEast    AwayWest    HomeSame    HomeATL AwayHOU AwaySDN AwayLAN
9.330154398 1   0   0   1   0   1   1   0   0
32.43881489 0   1   0   1   0   1   0   1   0
54.77178387 0   0   1   0   1   1   0   0   1
54.77178387 1   0   0   1   0   1   0   0   1

col1我尝试为要转换的每一列创建一个 1 ( ) 的新列:

spread(df,var1, col1) %>%
spread(var2, col1)%>%
spread(var3, col1)%>%
spread(var1, col1)

但是它不起作用。

谢谢

标签: rlinear-regressiondummy-variable

解决方案


一个基本的 R 选项是使用model.matrix

df <- cbind(df[, "Value1", drop = F], model.matrix(Value1 ~ . - 1, data = df))
df
#     Value1 var1AwayLAN var1AwaySDN var1HomeATL var2HomeATL var3AwayWest
#1  9.330154           0           0           1           0            0
#2 32.438815           0           1           0           1            0
#3 54.771784           1           0           0           1            0
#4 54.771784           1           0           0           1            1
#  var3HomeEast var3HomeWest var4HomeEast var4HomeSame
#1            1            0            0            0
#2            0            1            0            0
#3            0            0            0            1
#4            0            0            1            0

如有必要,我们可以使用

names(df) <- sub("var\\d", "", names(df))

重现您的预期输出。


样本数据

df <- read.table(text =
    "Value1       var1       var2      var3      var4
9.330154398  HomeATL    AwayHOU   HomeEast  AwayWest
32.43881489  AwaySDN    HomeATL   HomeWest  AwayWest
54.77178387  AwayLAN    HomeATL   AwayEast  HomeSame
54.77178387  AwayLAN    HomeATL   AwayWest  HomeEast", header = T)

推荐阅读