r - R总结折叠的Data.Table
问题描述
我有这样的数据
data <- data.table(
"School" = c(1, 1, 1, 1, 1, 1, 0, 1, 0, 0, 1, 1, 1, 0, 1, 0, 1, 1,
1, 1, 1, 0, 0, 1, 0, 1, 1, 1, 1, 1, 1, 0, 1, 0, 1, 0),
"Grade" = c(0, 1, 1, 1, 0, 0, 0, 1, 1, 1, 0, 1, 1, 0, 0, 1, 1, 1,
0, 0, 1, 1, 0, 1, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 1, 0),
"CAT" = c(1, 0, 1, 1, 0, 1, 0, 1, 1, 0, 1, 0, 0, 1, 0, 1, 0, 0,
0, 0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 1, 1, 0, 1, 1, 1, 1),
"FOX" = c(1, 1, 0, 1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 0, 0, 1, 1,
1, 1, 1, 1, 1, 0, 1, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0),
"DOG" = c(0, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 1, 1, 1, 0, 1, 1, 0,
0, 1, 1, 0, 0, 1, 0, 1, 1, 0, 1, 0, 1, 1, 1, 0, 1, 1)
)
并希望实现一个新的数据表,例如:
dataWANT <- data.frame(
"VARIABLE" = c('CAT', 'CAT', 'CAT', 'FOX', 'FOX', 'FOX', 'DOG', 'DOG', 'DOG'),
"SCHOOL" = c(1, 1, 0, 1, 1, 0, 1, 1, 0),
"GRADE" = c(0, 1, 1, 0, 1, 1, 0, 1, 1),
"MEAN" = c(NA)
)
dataWANT
当它们等于 1 时,取CAT
和FOX
和DOG
、和X的平均值。SCHOOL
GRADE
SCHOOL
GRADE
我知道如何一次做到这一点,但这不利于使用大数据。
data[, CAT1 := mean(CAT), by = list(SCHOOL)]
data[, FOX1 := mean(FOX), by = list(GRADE)]
data[, DOG1 := mean(DOG), by = list(SCHOOL, GRADE)]
data$CAT2 = unique(data[SCHOOL == 1, CAT1])
data$FOX2 = unique(data[GRADE == 1, FOX1])
data$DOG2 = unique(data[SCHOOL == 1 & GRADE == 1, DOG1])
请只使用这个:
data <- data.table(
"SCHOOL" = c(1, 1, 1, 1, 1, 1, 0, 1, 0, 0, 1, 1, 1, 0, 1, 0, 1, 1,
1, 1, 1, 0, 0, 1, 0, 1, 1, 1, 1, 1, 1, 0, 1, 0, 1, 0),
"GRADE" = c(0, 1, 1, 1, 0, 0, 0, 1, 1, 1, 0, 1, 1, 0, 0, 1, 1, 1,
0, 0, 1, 1, 0, 1, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 1, 0),
"CAT" = c(1, 0, 1, 1, 0, 1, 0, 1, 1, 0, 1, 0, 0, 1, 0, 1, 0, 0,
0, 0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 1, 1, 0, 1, 1, 1, 1),
"FOX" = c(1, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 0, 0, 1, 1,
1, 1, 1, 1, 1, 0, 1, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0),
"DOG" = c(0, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 1, 1, 1, 0, 1, 1, 0,
0, 1, 1, 0, 0, 1, 0, 1, 1, 0, 1, 0, 1, 1, 1, 0, 1, 1)
)
data[, CAT1 := mean(CAT), by = list(SCHOOL)]
data[, CAT2 := mean(CAT), by = list(GRADE)]
data[, CAT3 := mean(CAT), by = list(SCHOOL, GRADE)]
data[, FOX1 := mean(FOX), by = list(SCHOOL)]
data[, FOX2 := mean(FOX), by = list(GRADE)]
data[, FOX3 := mean(FOX), by = list(SCHOOL, GRADE)]
data[, DOG1 := mean(DOG), by = list(SCHOOL)]
data[, DOG2 := mean(DOG), by = list(GRADE)]
data[, DOG3 := mean(DOG), by = list(SCHOOL, GRADE)]
dataWANT <- data.frame(
"VARIABLE" = c('CAT', 'CAT', 'CAT', 'FOX', 'FOX', 'FOX', 'DOG', 'DOG', 'DOG'),
"TYPE" = c(1, 2, 3, 1, 2, 3, 1, 2, 3),
"MEAN" = c(0.48, 0.44, 0.428, 0.6, 0.611, 0.6428, 0.52, 0.61, 0.6428)
)
其中:当由 估计
TYPE
时等于 1,当MEAN
由 估计SCHOOL
时
TYPE
等于 2 ,当由和MEAN
估计GRADE
时
TYPE
等于 3MEAN
SCHOOL
GRADE
解决方案
我们可以使用rbindlist
after 创建 alist
通过获取MEAN
aftermelt
数据集(如在另一篇文章中)
library(data.table)
cols <- c('CAT', 'FOX', 'DOG')
data1 <- melt(data, measure.vars = cols)
list_cols <- list('SCHOOL', 'GRADE', c('SCHOOL', 'GRADE'))
lst1 <- lapply(list_cols, function(x)
data1[, .(MEAN = mean(value, na.rm = TRUE)), c(x, 'variable')])
rbindlist(lapply(lst1, function(x) {
nm1 <- setdiff(names(x), c('variable', 'MEAN'))
x[Reduce(`&`, lapply(mget(nm1), as.logical)),
.(VARIABLE = variable, MEAN)]}), idcol = 'TYPE')[order(VARIABLE)]
# TYPE VARIABLE MEAN
#1: 1 CAT 0.4800000
#2: 2 CAT 0.4444444
#3: 3 CAT 0.4285714
#4: 1 FOX 0.6000000
#5: 2 FOX 0.5555556
#6: 3 FOX 0.6428571
#7: 1 DOG 0.5200000
#8: 2 DOG 0.6111111
#9: 3 DOG 0.6428571
推荐阅读
- ios - UITableViewCell - 隐藏未使用的视图甚至不添加它们?
- python - TypeError:“条目”对象不可调用
- python - Spark 引起:java.lang.StackOverflowError 窗口函数?
- azure - Kubernetes:亲和力取决于 Azure 可用性集
- python - 你如何设计一个脚本来准备将不同数量的参数作为 argv?
- keras - Keras 层要求与摘要中不同的形状
- python - 如何合并列表中的所有相交元组?
- java - 有没有办法用输入定义引脚?
- typescript - Typescript:获取函数类型的最后一个参数的类型
- java - 在张量流中选择沿轴的随机点数