首页 > 解决方案 > 制作总和列的 R DataTable 解决方案

问题描述

data1=data.frame("group1"=c(1,1,1,1,2,2,2,2,3,3,3,3,1,1,1,1,2,2,2,2,3,3,3,3),
                 "group2"=c(1,1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,2,2),
                 "var1"=c(1,0,0,1,0,0,0,1,1,1,1,1,0,0,1,0,0,1,0,1,1,0,0,0),
                 "var2"=c(1,0,1,1,0,0,1,0,0,0,0,0,0,0,1,1,0,0,1,0,1,0,0,1),
                 "var3"=c(1,1,4,3,3,1,1,2,4,1,4,4,4,2,1,2,1,2,2,2,3,1,2,4))


data2=data.frame("group1"=rep(c(rep(1:3,2)),2),
                 "group2"=rep(c(rep(1:2,3))),
                 "var1"=sort(rep(0:1,6)),
                 "svar1" = c(2,2,0,3,3,3,1,2,4,1,1,1),
                 "var2"=sort(rep(0:1,6)),
                 "svar2" = c(rep(NA,12)))

我有'data1'并希望制作'data2'。它所做的是折叠“var1”和“var2”的实际计数以在“data2”中创建“svar1”和“svar2”。

要创建“svar1”,我们筛选“data1”中“group1”和“group2”的所有组合,然后只存储“0”和“1”的所有出现的总和,它们是“var1”的响应选项。我也希望为“var2”执行此操作以生成“svar2”

考虑到大数据,我也希望有一个 data.table 解决方案!现在我们可以忽略'var3'!

标签: rdata.table

解决方案


这是一种基于 data.table 的连接方法。我认为,在结果中,假设是var1 = var2在每一行上,并且svar1svar2原始数据框中具有这些组合的行数。

我将复制var1列并用 s 填充NAs0给您。

setDT(data1)

merge(
  data1[, .(svar1 = .N), by = .(group1, group2, var1)],
  data1[, .(svar2 = .N), by = .(group1, group2, var2)],
  by.x = c("group1", "group2", "var1"),
  by.y = c("group1", "group2", "var2"),
  all = TRUE
)
#     group1 group2 var1 svar1 svar2
#  1:      1      1    0     2     1
#  2:      1      1    1     2     3
#  3:      1      2    0     3     2
#  4:      1      2    1     1     2
#  5:      2      1    0     3     3
#  6:      2      1    1     1     1
#  7:      2      2    0     2     3
#  8:      2      2    1     2     1
#  9:      3      1    0    NA     4
# 10:      3      1    1     4    NA
# 11:      3      2    0     3     2
# 12:      3      2    1     1     2

推荐阅读