r - 创建一个热编码列,同时保留其他功能
问题描述
我有以下数据:
dataset <- structure(list(id = structure(c(2L, 3L, 1L, 3L, 1L, 9L), .Label = c("215101",
"215559", "216566", "217284", "219435", "220209", "220249", "220250",
"225678", "225679", "225687", "225869", "228420", "228435", "230621",
"230623", "233063", "233097", "233098", "235546", "235560", "235567",
"236379"), class = "factor"), cat1 = c("A", "B", "B", "A", "A",
"A"), cat2 = c("item 1", "item 1", "item 2", "item 5", "item 3",
"item 28"), cat3 = c("theme 2", "theme 2", "theme 1", "theme 4",
"theme 10", "theme 40")), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -6L))
我想创建一种模型矩阵,其中包含一个从列cat2
和cat3
. 因此,我的输出将如下所示:
structure(list(id = structure(c(1L, 1L, 2L, 3L, 3L, 9L), .Label = c("215101",
"215559", "216566", "217284", "219435", "220209", "220249", "220250",
"225678", "225679", "225687", "225869", "228420", "228435", "230621",
"230623", "233063", "233097", "233098", "235546", "235560", "235567",
"236379"), class = "factor"), cat1 = c("A", "B", "A", "A", "B",
"A"), `item 1` = c(0, 0, 1, 0, 1, 0), `item 2` = c(0, 1, 0, 0,
0, 0), `item 28` = c(0, 0, 0, 0, 0, 1), `item 3` = c(1, 0, 0,
0, 0, 0), `item 5` = c(0, 0, 0, 1, 0, 0), `theme 1` = c(0, 1,
0, 0, 0, 0), `theme 10` = c(1, 0, 0, 0, 0, 0), `theme 2` = c(0,
0, 1, 0, 1, 0), `theme 4` = c(0, 0, 0, 1, 0, 0), `theme 40` = c(0,
0, 0, 0, 0, 1)), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA,
-6L))
但是,我在这个数据集中没有我的自变量,我想保留id
和cat1
列。我怎样才能做到这一点?
解决方案
你可以使用merge
anddcast
两次。
library(reshape2)
merge(dcast(dataset, id + cat1 ~ cat2, fun.aggregate = length),
dcast(dataset, id + cat1 ~ cat3, fun.aggregate = length),
by = c("id", "cat1"))
# id cat1 item 1 item 2 item 28 item 3 item 5 theme 1 theme 10 theme 2 theme 4 theme 40
#1 215101 A 0 0 0 1 0 0 1 0 0 0
#2 215101 B 0 1 0 0 0 1 0 0 0 0
#3 215559 A 1 0 0 0 0 0 0 1 0 0
#4 216566 A 0 0 0 0 1 0 0 0 1 0
#5 216566 B 1 0 0 0 0 0 0 1 0 0
#6 225678 A 0 0 1 0 0 0 0 0 0 1
如果您有两个以上的变量要传播,您可能会melt
先获取数据。这将为您节省一些打字时间。
dcast(melt(dataset, id.vars = c("id", "cat1")), id + cat1 ~ value, fun.aggregate = length)
推荐阅读
- python - 根据子列表的第一个元素拆分子列表列表
- python - 如何在 tkinter 条目小部件上第二次从用户那里获取值
- react-native - 无法在 react-native-video 上实例化解码器 OMX.amlogic.avc.decoder.awesome
- swift - HealthKit 身份验证错误(代码 = 3)
- c - printf() 和 scanf() 的行为| 如何知道内置函数的工作方式?
- android - 在 Intent Service 类中使用 Retrofit 2
- android - 使用 ViewModel 构造函数而不是 ViewModelProvider 能够根据参数从 Room 数据库中获取数据
- javascript - 具有动态字段的动态组合框
- javascript - 以角度传递带有模板语法的SVG路径
- scripting - Discord 脚本可在朋友不在频道中的情况下向他们发送私人消息