首页 > 解决方案 > 如何使用 R 将排行榜重新塑造成不同的格式?

问题描述

我有一个看起来像这样的排名表(由于包中的netleague函数netmeta)(在这个例子中组成了值以简化它的外观):

一个 0.3(-0.4 到 0.6) .
0.1(-0.9 到 0.3) -0.6(-0.9 到 0.0
0.2(-0.8 到 0.4) 0.3(-0.6 到 0.1) C

下方的三角形代表间接比较(我对此示例感兴趣的那些)并读取列与行。例如,0.1 (-0.9 to 0.3)A vs B

这些值作为 a 个元素存储在 R 的环境data.framelist

结果我需要的是:

字符1 字符2 价值1 价值2 价值3
一个 0.1 0.9 0.3
一个 C 0.2 -0.8 0.4
一个 -0.1 -0.3 0.9
C 0.3 -0.6 0.1
C 一个 -0.2 -0.4 0.8
C -0.3 -0.1 0.6

请注意,为了获得 B 与 A,我颠倒了这些值。根据结果​​的性质,我可能会改为 1/value。

我尝试查看函数背后的代码,netleague但揭穿它对我来说非常先进。

有没有人知道如何使用 R 自动执行此任务?

到目前为止,我最好的选择是手动执行此操作(耗时超过 8000 个值并且可能会引入人为拼写错误)或至少使用 Excel 中的一些公式(仍然非常耗时,我不得不适应公式逐行)。

可重现格式的数据:(来源dput(league[["fixed"]][1:4, ]):)

structure(list(V1 = c("dalia", "0.08 (-0.9 to  0.26)", 
"-0.15 (-0.40 to  0.06)", "0.37 ( 0.00 to  0.78)"), V2 = c("-0.05 (-0.33 to  0.22)", 
"camelia", "-0.24 (-0.49 to -0.01)", "0.31 (-0.09 to  0.75)"
), V3 = c("-0.14 (-0.64 to  0.32)", "-0.37 (-0.66 to -0.05)", 
"margher", "0.54 ( 0.12 to  0.95)"), V4 = c(".", ".", ".", 
"rosa_can"), V5 = c(".", ".", ".", "."), V6 = c(".", ".", ".", 
"."), V7 = c(".", ".", ".", "."), V8 = c("0.65 ( 0.54 to  0.87)", 
"0.54 ( 0.38 to  0.78)", "0.77 ( 0.2 to  1.28)", "0.29 (-0.08 to  0.67)"
), V9 = c(".", ".", ".", "."), V10 = c(".", ".", ".", "."), V11 = c(".", 
".", ".", "."), V12 = c("0.23 (-0.52 to  0.99)", ".", "-0.05 (-0.56 to  0.47)", 
"."), V13 = c(".", "0.07 (-0.25 to  0.33)", ".", ".")), row.names = c(NA, 
4L), class = "data.frame")

标签: rdataframe

解决方案


这是您的问题的第一次尝试,但我无法完全匹配您想要的输出。所以请指出错误,这样我就可以再打一轮了..

library(data.table)
# Sample data
DT <- fread('"A"    "0.3 (-0.4 to 0.6)"     "."
"0.1 (-0.9 to 0.3)"     "B"     "-0.6 (-0.9 to 0.0)"
"0.2 (-0.8 to 0.4)"     "0.3 (-0.6 to 0.1)"     "C"', header = FALSE)

# Code
# Get names, set as row and colnames
names.v <- diag(as.matrix(DT))
setnames(DT, new = names.v)
DT[, char2 := names.v]
#                    A                 B                  C rowname
# 1:                 A 0.3 (-0.4 to 0.6)                  .       A
# 2: 0.1 (-0.9 to 0.3)                 B -0.6 (-0.9 to 0.0)       B
# 3: 0.2 (-0.8 to 0.4) 0.3 (-0.6 to 0.1)                  C       C

# Melt to long
ans <- setcolorder(melt(DT, id.vars = "char2", variable.name = "char1"), c(2,1,3))
#    char1 char2              value
# 1:     A     A                  A
# 2:     A     B  0.1 (-0.9 to 0.3)
# 3:     A     C  0.2 (-0.8 to 0.4)
# 4:     B     A  0.3 (-0.4 to 0.6)
# 5:     B     B                  B
# 6:     B     C  0.3 (-0.6 to 0.1)
# 7:     C     A                  .
# 8:     C     B -0.6 (-0.9 to 0.0)
# 9:     C     C                  C
                
# keep relevant rows
ans <- ans[!char1 == char2, ]
# extract numeric values
ans[, paste0("val", 1:length(tstrsplit(ans$value, "[^0-9-\\.]+", perl = TRUE))) := 
      tstrsplit(ans$value, "[^0-9-\\.]+", perl = TRUE)][]
#    char1 char2              value val1 val2 val3
# 1:     A     B  0.1 (-0.9 to 0.3)  0.1 -0.9  0.3
# 2:     A     C  0.2 (-0.8 to 0.4)  0.2 -0.8  0.4
# 3:     B     A  0.3 (-0.4 to 0.6)  0.3 -0.4  0.6
# 4:     B     C  0.3 (-0.6 to 0.1)  0.3 -0.6  0.1
# 5:     C     A                  .    . <NA> <NA>
# 6:     C     B -0.6 (-0.9 to 0.0) -0.6 -0.9  0.0

推荐阅读