首页 > 解决方案 > R解析具有JSON数组的数据帧列并转换为单热编码

问题描述

我有一个数据框,其中有一列具有字符串形式的 JSON 数组。我的目标是解析列并转换为单热编码,但在解析 JSON 时遇到错误。

library(jsonlite)
> df <- data_frame(Amenities=c("[\"Parking\", \"Lawn\", \"Garage\", \"Frontyard\"]", "[\"Parking\", \"Lawn\", \"Garage\", \"Backyard\"]", "[\"Parking\", \"Lawn\", \"Garage\"]"))
> df
# A tibble: 3 x 1
  Amenities                                           
  <chr>                                               
1 "[\"Parking\", \"Lawn\", \"Garage\", \"Frontyard\"]"
2 "[\"Parking\", \"Lawn\", \"Garage\", \"Backyard\"]" 
3 "[\"Parking\", \"Lawn\", \"Garage\"]"               
> df <- df %>% mutate(Amenities=fromJSON(Amenities))
Error: parse error: trailing garbage
          awn", "Garage", "Frontyard"] ["Parking", "Lawn", "Garage", "
                     (right here) ------^
> 

预期输出:

Parking  Lawn  Garage  Frontyard  Backyard
      1     1       1          1         0
      1     1       1          0         1
      1     1       1          0         0

解决方案:同时保留现有的数据框。

library(qdapTools)
df <- cbind(df, +(mtabulate(str_extract_all(df$amenities, "\\w+( +\\w+)*"))))

标签: rjson

解决方案


我们可以在一行中做到这一点mtabulate

library(qdapTools)
library(stringr)
mtabulate(str_extract_all(df$Amenities, "\\w+"))

-输出

#  Backyard Frontyard Garage Lawn Parking
#1        0         1      1    1       1
#2        1         0      1    1       1
#3        0         0      1    1       1

推荐阅读