首页 > 解决方案 > 在 BASE R 中操作 data.frame 格式

问题描述

我已经看到了这里给出的答案,但我想知道在我的情况下,BASE R 中是否有一种有效的方法可以将我的inputdata.frame 更改为我想要的outputdata.frame,如下所示?

input <- data.frame(id = c(1,3,6), school = LETTERS[1:3], read_1 = c(20,22,24),
               read_1_sp = c(T,F,T), read_2 =c(45,47,49),read_2_sp = c(F,F,F),  
               math_1 =c(20,22,NA), math_1_sp = c(T,F,NA), math_2 = c(NA,35,37),
               math_2_sp =c(NA,F,F))


output <- data.frame(id = c(rep(1,3),rep(3,4), rep(6, 3)),school = c(rep("A",3),rep("B",4), rep("C", 3)),   
                      subject = c("read","read","math","read","read","math", "math","read","read","math"),  
                      no.= c(1,2,1,1,2,1,2,1,2,2), score = c(20,45,20,22,47,22,35,24,49,37),    
                      sp = c(T,F,T,T,F,T,T,T,F,T))

标签: rdataframe

解决方案


1) Base - reshape 创建一个varying包含两个元素的列表 ,每个元素都是名称的字符向量——第一个元素是score名称的向量,第二个是sp名称的向量。将其与 base 一起使用reshape。然后按idvar变量排序(如果不需要,则省略执行排序的两行)并使用 NA 删除行na.omitreshape生成一subject列,其中包含诸如read_1. 转换语句将其分成两列,subject并且no.

varying <- lapply(c("\\d$", "sp$"), grep, names(input), value = TRUE)

r <- reshape(input, dir = "long", idvar = c("id", "school"), 
  varying = varying, v.names = c("score", "sp"),
  times = varying[[1]], timevar = "subject")  

o <- with(r, order(id, school))
r <- r[o, ]
r <- na.omit(r)

transform(r, subject = sub("_.*", "", subject), no = as.numeric(sub(".*_", "", subject)))

给予:

           id school subject score    sp no
1.A.read_1  1      A    read    20  TRUE  1
1.A.read_2  1      A    read    45 FALSE  2
1.A.math_1  1      A    math    20  TRUE  1
3.B.read_1  3      B    read    22 FALSE  1
3.B.read_2  3      B    read    47 FALSE  2
3.B.math_1  3      B    math    22 FALSE  1
3.B.math_2  3      B    math    35 FALSE  2
6.C.read_1  6      C    read    24  TRUE  1
6.C.read_2  6      C    read    49 FALSE  2
6.C.math_2  6      C    math    37 FALSE  2

2) data.table - 融化 该问题要求提供基本解决方案,但只是为了比较,我们还展示了melt在 data.table 中使用的解决方案。

转换input为 data.table 并使用键和指示的模式将其融化。 与inmelt没有对应物,而是在这种情况下的列中提供索引号。我们用它来索引. 这会产生诸如我们用来拆分为两列的元素,以及。最后使用指定键删除具有 NA 的行并排序。times=reshapevariable.namesubjecttimesread_1freadsubjectsubjectnona.omit

library(data.table)

input2 <- as.data.table(input, key = c("id", "school"))
times <- grep("\\d$", names(input2), value = TRUE)  # score col names

melt(input2, measure = patterns(sp = "sp", score = "\\d$"), variable.name = "subject")[, 
  c("subject", "no"):= fread(text = times[subject], sep = "_")][, 
  na.omit(.SD), key = key(input2)]

给予:

    id school    sp score subject no
 1:  1      A  TRUE    20    read  1
 2:  1      A FALSE    45    read  2
 3:  1      A  TRUE    20    math  1
 4:  3      B FALSE    22    read  1
 5:  3      B FALSE    47    read  2
 6:  3      B FALSE    22    math  1
 7:  3      B FALSE    35    math  2
 8:  6      C  TRUE    24    read  1
 9:  6      C FALSE    49    read  2
10:  6      C FALSE    37    math  2

推荐阅读