首页 > 解决方案 > 基于两列将长格式数据框重塑为宽

问题描述

我有一个数据框,如下所示:

dat <- data.frame(QuarterYear = c("Q4 2019", "Q4 2019", "Q4 2019", 
                              "Q4 2019", "Q4 2019", "Q4 2019", "Q4 2019", "Q4 2019", "Q4 2019", 
                              "Q4 2019", "Q4 2019", "Q4 2019", "Q1 2020", "Q1 2020", "Q1 2020", 
                              "Q1 2020", "Q1 2020", "Q1 2020", "Q1 2020", "Q1 2020", "Q1 2020", 
                              "Q1 2020", "Q1 2020", "Q1 2020", "Q2 2020", "Q2 2020", "Q2 2020", 
                              "Q2 2020", "Q2 2020", "Q2 2020", "Q2 2020", "Q2 2020", "Q2 2020", 
                              "Q2 2020", "Q2 2020", "Q2 2020", "Q3 2020", "Q3 2020", "Q3 2020", 
                              "Q3 2020", "Q3 2020", "Q3 2020", "Q3 2020", "Q3 2020", "Q3 2020", 
                              "Q3 2020", "Q3 2020", "Q3 2020"), 
              Grade = c("Grade 8", "Grade 8", 
                        "Grade 8", "Grade 9", "Grade 9", "Grade 9", "Grade 10", "Grade 10", 
                        "Grade 10", "Grade 11", "Grade 11", "Grade 11", "Grade 8", "Grade 8", 
                        "Grade 8", "Grade 9", "Grade 9", "Grade 9", "Grade 10", "Grade 10", 
                        "Grade 10", "Grade 11", "Grade 11", "Grade 11", "Grade 8", "Grade 8", 
                        "Grade 8", "Grade 9", "Grade 9", "Grade 9", "Grade 10", "Grade 10", 
                        "Grade 10", "Grade 11", "Grade 11", "Grade 11", "Grade 8", "Grade 8", 
                        "Grade 8", "Grade 9", "Grade 9", "Grade 9", "Grade 10", "Grade 10", 
                        "Grade 10", "Grade 11", "Grade 11", "Grade 11"), 
              Type = c("overallAverage", 
                       "CT", "RT", "overallAverage", "CT", "RT", "overallAverage", "CT", 
                       "RT", "overallAverage", "CT", "RT", "overallAverage", "CT", "RT", 
                       "overallAverage", "CT", "RT", "overallAverage", "CT", "RT", "overallAverage", 
                       "CT", "RT", "overallAverage", "CT", "RT", "overallAverage", "CT", 
                       "RT", "overallAverage", "CT", "RT", "overallAverage", "CT", "RT", 
                       "overallAverage", "CT", "RT", "overallAverage", "CT", "RT", "overallAverage", 
                       "CT", "RT", "overallAverage", "CT", "RT"), 
              value = c(2.48, 2.21, 
                        0.27, 3.48, 3.03, 0.45, 4.6, 4, 0.6, 2.8, 2.4, 0.4, 2.54, 2.28, 
                        0.26, 3.45, 3, 0.45, 4.46, 3.88, 0.58, 3.56, 2.81, 0.75, 2.47, 
                        2.14, 0.33, 2.96, 2.54, 0.41, 4.1, 3.69, 0.41, 3.44, 2.61, 0.83, 
                        2, 1.81, 0.19, 2.54, 2.26, 0.28, 4.11, 3.68, 0.43, 2.67, 2.11, 
                        0.56), stringsAsFactors = FALSE)

我正在尝试将此数据框重塑为宽格式,其中的唯一值Type将是行,并且值将基于QuarterYearand填充Grade

简单来说,如果第一行是OverallAverage,则前 4 列将Q4 2019-Grade 8表示Q3 2020- Grade 8。接下来的 4 列将是Q4 2019-Grade 9toQ3 2020-Grade 9等。

我尝试使用该reshape功能

widerDat <- reshape(dat, direction = "wide",idvar = "Type",timevar = "value")  

如何组合QuarterYearGrade获得所需的输出?

请帮助我找到合适的解决方案。提前致谢!!

标签: rdataframereshape

解决方案


您可以paste将时间变量放在一起并将其用作单个time=变量,如下所示:

res <- reshape(transform(dat, time=paste(QuarterYear, Grade)), 
               direction="wide", idvar="Type", timevar="time",
               drop=c("QuarterYear", "Grade"))  
res
#             Type value.Q4 2019 Grade 8 value.Q4 2019 Grade 9
# 1 overallAverage                  2.48                  3.48
# 2             CT                  2.21                  3.03
# 3             RT                  0.27                  0.45
#   value.Q4 2019 Grade 10 value.Q4 2019 Grade 11 value.Q1 2020 Grade 8
# 1                    4.6                    2.8                  2.54
# 2                    4.0                    2.4                  2.28
# 3                    0.6                    0.4                  0.26
#   value.Q1 2020 Grade 9 value.Q1 2020 Grade 10 value.Q1 2020 Grade 11
# 1                  3.45                   4.46                   3.56
# 2                  3.00                   3.88                   2.81
# 3                  0.45                   0.58                   0.75
#   value.Q2 2020 Grade 8 value.Q2 2020 Grade 9 value.Q2 2020 Grade 10
# 1                  2.47                  2.96                   4.10
# 2                  2.14                  2.54                   3.69
# 3                  0.33                  0.41                   0.41
#   value.Q2 2020 Grade 11 value.Q3 2020 Grade 8 value.Q3 2020 Grade 9
# 1                   3.44                  2.00                  2.54
# 2                   2.61                  1.81                  2.26
# 3                   0.83                  0.19                  0.28
#   value.Q3 2020 Grade 10 value.Q3 2020 Grade 11
# 1                   4.11                   2.67
# 2                   3.68                   2.11
# 3                   0.43                   0.56

要以所需格式对列进行排序,我们可以使用substr.

nm <- names(res)[-1]  ## store names in a vector
## generate order vector by relevant characters
o <- order(as.double(substr(nm, 21, 22)), as.double(substr(nm, 10, 13)),
           as.double(substr(nm, 8, 8))) + 1
res <- res[c(1, o)]  ## ordering
names(res)
#  [1] "Type"                   "value.Q4 2019 Grade 8"  "value.Q1 2020 Grade 8" 
#  [4] "value.Q2 2020 Grade 8"  "value.Q3 2020 Grade 8"  "value.Q4 2019 Grade 9" 
#  [7] "value.Q1 2020 Grade 9"  "value.Q2 2020 Grade 9"  "value.Q3 2020 Grade 9" 
# [10] "value.Q4 2019 Grade 10" "value.Q1 2020 Grade 10" "value.Q2 2020 Grade 10"
# [13] "value.Q3 2020 Grade 10" "value.Q4 2019 Grade 11" "value.Q1 2020 Grade 11"
# [16] "value.Q2 2020 Grade 11" "value.Q3 2020 Grade 11"

推荐阅读