首页 > 解决方案 > 确保使用 {mongolite} 读取 MongoDB 数据时数据帧变为 tibbles

问题描述

我必须处理包含嵌套文档的 JSON 文档,并且在某种程度上有一个数组,该数组又包含单个文档,当在 R 中读取/解析 JSON 时,这些文档在概念上会映射回“数据框行”。

从数据库中检索数据时,如何确保将所有内容data frames都输入?tibbles

以下示例数据的所需结果

期望的结果

query_res$levelOne <- query_res$levelOne %>% tibble::as_tibble()
query_res$levelOne$levelTwo <- query_res$levelOne$levelTwo %>% 
  tibble::as_tibble()
query_res$levelOne$levelTwo$levelThree <- query_res$levelOne$levelTwo$levelThree %>% 
  purrr::map(tibble::as_tibble)

query_res %>% str()
# Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 2 obs. of  3 variables:
#  $ labels  :List of 2
#   ..$ : chr  "label-a" "label-b"
#   ..$ : chr  "label-a" "label-b"
#  $ levelOne:Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 2 obs. of  1 variable:
#   ..$ levelTwo:Classes ‘tbl_df’, ‘tbl’ and 'data.frame':  2 obs. of  1 variable:
#   .. ..$ levelThree:List of 2
#   .. .. ..$ :Classes ‘tbl_df’, ‘tbl’ and 'data.frame':    2 obs. of  3 variables:
#   .. .. .. ..$ x: chr  "A" "B"
#   .. .. .. ..$ y: int  1 2
#   .. .. .. ..$ z: logi  TRUE FALSE
#   .. .. ..$ :Classes ‘tbl_df’, ‘tbl’ and 'data.frame':    2 obs. of  3 variables:
#   .. .. .. ..$ x: chr  "A" "B"
#   .. .. .. ..$ y: int  10 20
#   .. .. .. ..$ z: logi  FALSE TRUE
#  $ schema  : chr  "0.0.1" "0.0.1"

如果我尝试通过dplyr::mutate()or执行此操作,则会purrr::map*_df()收到Error: Columnis of unsupported class data.frame错误消息。

相关帖子

解析/操作嵌套 JSON 时递归确保 tibbles 而不是数据帧


例子

要放入文件的 JSON 数据dump.json

{"labels": ["label-a", "label-b"],"levelOne": {"levelTwo": {"levelThree": [{"x": "A","y": 1,"z": true},{"x": "B","y": 2,"z": false}]}},"schema": "0.0.1"}
{"labels": ["label-a", "label-b"],"levelOne": {"levelTwo": {"levelThree": [{"x": "A","y": 10,"z": false},{"x": "B","y": 20,"z": true}]}},"schema": "0.0.1"}

将 JSON 导入 MongoDB

con <- mongolite::mongo(
  db = "stackoverflow",
  collection = "nested_json"
)

con$import(file("dump.json"))

这是您应该在 MongoDB 中看到的内容

在此处输入图像描述

查询方式$find()

query_res <- con$find() %>%
  tibble::as_tibble()

query_res %>% str()
# Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 2 obs. of  3 variables:
#  $ labels  :List of 2
#   ..$ : chr  "label-a" "label-b"
#   ..$ : chr  "label-a" "label-b"
#  $ levelOne:'data.frame': 2 obs. of  1 variable:
#   ..$ levelTwo:'data.frame':  2 obs. of  1 variable:
#   .. ..$ levelThree:List of 2
#   .. .. ..$ :'data.frame':    2 obs. of  3 variables:
#   .. .. .. ..$ x: chr  "A" "B"
#   .. .. .. ..$ y: int  1 2
#   .. .. .. ..$ z: logi  TRUE FALSE
#   .. .. ..$ :'data.frame':    2 obs. of  3 variables:
#   .. .. .. ..$ x: chr  "A" "B"
#   .. .. .. ..$ y: int  10 20
#   .. .. .. ..$ z: logi  FALSE TRUE
#  $ schema  : chr  "0.0.1" "0.0.1"

查询方式$iterate()

it <- con$iterate()

iter_res <- list()
while(!is.null(x <- it$one())) {
  # Ensure array columns stay individual list columns when casting to tibble:
  # (As opposed to multiple array items being turned into one tibble row)
  p <- function(x) {
    is.list(x) &&
      is.null(names(x))
  }
  f <- function(x) {
    list(x %>% unlist())
  }
  x <- x %>% purrr::map_if(p, f)

  # Necessary to get the `simplifyVector = TRUE` effect:
  iter_res_current <- x %>%
    jsonlite:::simplify() %>%
    tibble::as_tibble()

  # Combine with previous iteration results:
  iter_res <- c(iter_res, list(iter_res_current))
}
iter_res_df <- iter_res %>%
  dplyr::bind_rows()

iter_res_df %>% str()
# Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 2 obs. of  3 variables:
#  $ labels  :List of 2
#   ..$ : chr  "label-a" "label-b"
#   ..$ : chr  "label-a" "label-b"
#  $ levelOne:List of 2
#   ..$ :List of 1
#   .. ..$ levelThree:'data.frame': 2 obs. of  3 variables:
#   .. .. ..$ x: chr  "A" "B"
#   .. .. ..$ y: int  1 2
#   .. .. ..$ z: logi  TRUE FALSE
#   ..$ :List of 1
#   .. ..$ levelThree:'data.frame': 2 obs. of  3 variables:
#   .. .. ..$ x: chr  "A" "B"
#   .. .. ..$ y: int  10 20
#   .. .. ..$ z: logi  FALSE TRUE
#  $ schema  : chr  "0.0.1" "0.0.1"

标签: rjsonnestedjsonlite

解决方案


推荐阅读