首页 > 解决方案 > 带有 JSON 字符串的 R 数据框列 - 需要制作 JSON 对象列

问题描述

所以数据框有一列(类别)是 JSON。这是一个示例

{"id":254,"name":"Performances","slug":"dance/performances","position":1,"parent_id":6,"parent_name":"Dance","color":10917369,"urls":{"web":{"discover":"http://www.kickstarter.com/discover/categories/dance/performances"}}}

而且我很难将一些 json 对象转换为数据框的特征。

示例:我真的希望数据框 ['parent_cat'] 包含来自数据框 ['category'] 的“parent_name”的 JSON 值。

下面是我对应用的尝试,但正如您所见,其中一条记录返回了一个列表。

json <- function(r){
  return(data.frame(jsonlite::fromJSON(txt=r['category']),stringsAsFactors=F)$name)
}

json2 <- function(df){
  data.frame(jsonlite::fromJSON(df$category),stringsAsFactors=F)$parent_name
}

df$child_cat <- apply(df, 1,json)

df$parent_cat <- apply(df,1,json2)

head(df[c("child_cat","parent_cat","category")])

child_cat
<chr>
                   parent_cat
                     <list>
1   Performances    <chr [1]>   
2   Hardware    <chr [1]>   
3   Software    <chr [1]>   
4   Anthologies <chr [1]>   
5   Experimental    <chr [1]>   
6   Software    <chr [1]>   

我尝试使用 dplyr,但我对每条记录都遇到相同的结果......至少它不是一个列表!也许 dplyr 只需要一个小的调整?

json <- function(r){
  return(data.frame(fromJSON(r),stringsAsFactors=F)$name)
}

json2 <- function(r){
  return(data.frame(fromJSON(r),stringsAsFactors=F)$parent_name)
}

df2 <- 
    df %>% 
    mutate(p = json(category),
           c = json2(category))

head(df2[c("p","c")])

p
<chr>
                     c
                    <chr>
1   Performances    Dance       
2   Performances    Dance       
3   Performances    Dance       
4   Performances    Dance       
5   Performances    Dance       
6   Performances    Dance

我在这里先向您的帮助表示感谢。

标签: rjsondataframe

解决方案


tidyjson软件包可能是您正在寻找的:

library(dplyr)
library(tidyjson)

df <- tibble(category = '{"id":254,"name":"Performances","slug":"dance/performances","position":1,"parent_id":6,"parent_name":"Dance","color":10917369,"urls":{"web":{"discover":"http://www.kickstarter.com/discover/categories/dance/performances"}}}')

df <- df %>% 
  mutate(cat_parent = category %>% 
           spread_all() %>%
           pull(parent_name),
         cat_child = category %>% 
           spread_all() %>%
           pull(name))

这可能有助于检查列中的字符是否不是有效的 json 输入。

library(dplyr)
library(tidyjson)
library(jsonlite)

df <- tibble(category = c('{"id":254,"name":"Performances","slug":"dance/performances","position":1,"parent_id":6,"parent_name":"Dance","color":10917369,"urls":{"web":{"discover":"http://www.kickstarter.com/discover/categories/dance/performances"}}}', 
                          'not_json'))

df <- df %>% 
  rowwise() %>%
  mutate(check_json = validate(category),
         cat_parent = ifelse(check_json, 
                              category %>% 
                                spread_all() %>%
                                pull(parent_name), 
                             NA))

推荐阅读