首页 > 解决方案 > Trying to convert nested sublist into a data frame

问题描述

I have a list that is a few levels deep:

[[12]]
[[12]][[1]]
                  estimateName                    insuredName                      priceList 
        "KING"                 "IDIF8X_MAY18" 
                      laborEff                    claimNumber                   policyNumber 
 "Restoration/Service/Remodel"                 "000000-01"               "0000000" 
                    typeOfLoss                     roofDamage                        deprMat 
                        "Hail"                            "0"                            "1" 
                    deprNonMat                      deprOandP                      deprTaxes 
                           "1"                            "1"                            "1" 
                        onsite            recipientsXNAddress                      carrierId 
                           "1"              "CO"                      "00000" 
                  estimateType 
                       "Mixed" 

[[12]][[2]]
                                       type                                     lineNum 
                                        "I"                                        "12" 
                                        cat                                         sel 
                                      "SFG"                                      "GUTA" 
                                        act                                   actPrefix 
                                        "&"                                       "R&R" 
                                   "1" 


[[13]]
[[13]][[1]]
                  eName                    iName                      priceList 
        "KING"                 "MAY18" 
                      laborEff                    claimNumber                   policyNumber 
 "Restoration/Service/Remodel"                 "00000-01"               "000000000" 
                    typeOfLoss                     roofDamage                        deprMat 
                        "Hail"                            "0"                            "1" 
                    deprNonMat                      deprOandP                      deprTaxes 
                           "1"                            "1"                            "1" 
                        onsite            recipientsXNAddress                      carrierId 
                           "1"              "MRP.BRIGHTON.CO"                      "2570112" 
                  estimateType 
                       "Mixed" 

[[13]][[2]]
                                        type                                      lineNum 
                                         "I"                                         "13" 
                                         cat                                          sel 
                                       "FEN"                                      "VNLS6" 
                                         act                                    actPrefix 
                                         "&"                                        "R&R" 

This is just a sample. I just was to create a data frame that will convert the attributes as the header and all the rows are the values for the. I tried to unlist and a few other functions but wasn't giving me the correct format.

So a desired output would be two rows of values with the column names.

list(list(structure(c("DARIAN_KING_&_CASSI1", "DARIAN KING & CASSIDY R KING", 
"IDIF8X_MAY18", "Restoration/Service/Remodel", "037262569-01", 
"H3726819012070", "Hail", "0", "1", "1", "1", "1", "1", "MRP.BRIGHTON.CO", 
"2570112", "Mixed"), .Names = c("estimateName", "insuredName", 
"priceList", "laborEff", "claimNumber", "policyNumber", "typeOfLoss", 
"roofDamage", "deprMat", "deprNonMat", "deprOandP", "deprTaxes", 
"onsite", "recipientsXNAddress", "carrierId", "estimateType")), 
    structure(c("I", "12", "SFG", "GUTA", "&", "R&R", "Gutter / downspout - aluminum - up to 5\"", 
    "12", "12", "LF", "Dwelling", "0.32", "4.23", "54.6", "36.33", 
    "2", "1", "9", "25", "18.27", "1.49", "37.28", "18.81", "56.09", 
    "1", "29.76", "10.08", "6.36", "13.32", "0.58", "24.84", 
    "1"), .Names = c("type", "lineNum", "cat", "sel", "act", 
    "actPrefix", "desc", "calc", "qty", "unit", "coverageName", 
    "remove", "replace", "total", "acv", "deprType", "recoverable", 
    "age", "lifeExp", "depr", "tax", "acvTotal", "deprTotal", 
    "rcvTotal", "isPartOfInitSettle", "laborTotal", "laborBase", 
    "laborBurden", "laborMarkup", "laborHours", "material", "containsBSCDontApply"
    ))), list(structure(c("DARIAN_KING_&_CASSI1", "DARIAN KING & CASSIDY R KING", 
"IDIF8X_MAY18", "Restoration/Service/Remodel", "037262569-01", 
"H3726819012070", "Hail", "0", "1", "1", "1", "1", "1", "MRP.BRIGHTON.CO", 
"2570112", "Mixed"), .Names = c("estimateName", "insuredName", 
"priceList", "laborEff", "claimNumber", "policyNumber", "typeOfLoss", 
"roofDamage", "deprMat", "deprNonMat", "deprOandP", "deprTaxes", 
"onsite", "recipientsXNAddress", "carrierId", "estimateType")), 
    structure(c("I", "13", "FEN", "VNLS6", "&", "R&R", "Vinyl (PVC) fence, 5'- 6' high - full slat", 
    "8*3", "24", "LF", "Other Structures", "4.01", "27.71", "761.28", 
    "721.38", "2", "9", "150", "39.9", "29.61", "749.21", "41.68", 
    "790.89", "1", "267.84", "141.36", "80.64", "45.84", "9.1", 
    "493.44", "1"), .Names = c("type", "lineNum", "cat", "sel", 
    "act", "actPrefix", "desc", "calc", "qty", "unit", "coverageName", 
    "remove", "replace", "total", "acv", "deprType", "age", "lifeExp", 
    "depr", "tax", "acvTotal", "deprTotal", "rcvTotal", "isPartOfInitSettle", 
    "laborTotal", "laborBase", "laborBurden", "laborMarkup", 
    "laborHours", "material", "containsBSCDontApply"))))

标签: rlistdataframe

解决方案


Up front:

dplyr::bind_rows(lapply(datlst, function(dl) as.data.frame(as.list(unlist(dl)), stringsAsFactors=FALSE)))
#           estimateName                  insuredName    priceList                    laborEff  claimNumber   policyNumber typeOfLoss roofDamage deprMat deprNonMat deprOandP deprTaxes onsite
# 1 DARIAN_KING_&_CASSI1 DARIAN KING & CASSIDY R KING IDIF8X_MAY18 Restoration/Service/Remodel 037262569-01 H3726819012070       Hail          0       1          1         1         1      1
# 2 DARIAN_KING_&_CASSI1 DARIAN KING & CASSIDY R KING IDIF8X_MAY18 Restoration/Service/Remodel 037262569-01 H3726819012070       Hail          0       1          1         1         1      1
#   recipientsXNAddress carrierId estimateType type lineNum cat   sel act actPrefix                                       desc calc qty unit     coverageName remove replace  total    acv deprType
# 1     MRP.BRIGHTON.CO   2570112        Mixed    I      12 SFG  GUTA   &       R&R   Gutter / downspout - aluminum - up to 5"   12  12   LF         Dwelling   0.32    4.23   54.6  36.33        2
# 2     MRP.BRIGHTON.CO   2570112        Mixed    I      13 FEN VNLS6   &       R&R Vinyl (PVC) fence, 5'- 6' high - full slat  8*3  24   LF Other Structures   4.01   27.71 761.28 721.38        2
#   recoverable age lifeExp  depr   tax acvTotal deprTotal rcvTotal isPartOfInitSettle laborTotal laborBase laborBurden laborMarkup laborHours material containsBSCDontApply
# 1           1   9      25 18.27  1.49    37.28     18.81    56.09                  1      29.76     10.08        6.36       13.32       0.58    24.84                    1
# 2        <NA>   9     150  39.9 29.61   749.21     41.68   790.89                  1     267.84    141.36       80.64       45.84        9.1   493.44                    1

Let's decompose this. Within each top-level are two very-different lists, so I'm guessing that they need to be on the same row. We can combine them simply with unlist(...). (I'll use str and truncation where needed to aid in presentation.)

str(unlist(datlst[[1]]))
#  Named chr [1:48] "DARIAN_KING_&_CASSI1" "DARIAN KING & CASSIDY R KING" "IDIF8X_MAY18" "Restoration/Service/Remodel" "037262569-01" "H3726819012070" "Hail" "0" "1" "1" "1" "1" "1" ...
#  - attr(*, "names")= chr [1:48] "estimateName" "insuredName" "priceList" "laborEff" ...

Knowing that we can convert a named-list into a data.frame easily, let's convert this named-vector into a named-list:

str(as.list(unlist(datlst[[1]])))
# List of 48
#  $ estimateName        : chr "DARIAN_KING_&_CASSI1"
#  $ insuredName         : chr "DARIAN KING & CASSIDY R KING"
#  $ priceList           : chr "IDIF8X_MAY18"
#  ...snip...
#  $ laborHours          : chr "0.58"
#  $ material            : chr "24.84"
#  $ containsBSCDontApply: chr "1"

So we can apply that to each element of datlst with lapply. From there, we need to rbind them all. In base R this is usually done with do.call(rbind, lapply(...)), but I noticed that there is an element that is in one and not in the other (recoverable), so this highlights a constraint of rbind: the column names must be the same and in the same order. However, two (non-base) tools can work around this:

  • dplyr::bind_rows(...)
  • data.table::rbindlist(..., fill=TRUE)

I demonstrated the first, but the second works just as well ... use whichever one you may already have installed.


推荐阅读