首页 > 解决方案 > 如何在 r 中将 json 对象展平为数据框

问题描述

我正在尝试将复杂的 json 对象展平json$suggestions$events$ticket_availability为数据框“票证”。

我尝试了各种方法,包括:

fromJSON((tmp[[1]][,2]), flatten=TRUE)
tmp %>%
  map(~ fromJSON(.x)) %>%
  bind_rows()

到 stackoverflow.com/questions/11553592/…。似乎没有一个对我有用。


library(httr)
library(rvest)
library(dplyr)
library(magrittr)
library(stringr)
library(lubridate)
library(purrr)
library(jsonlite)

getYear = "2019"
getWeek = "31"


base_url = "https://www.eventbrite.com/d/poland--pozna%C5%84/conference/"
query_params = list(yr=getYear, wk=getWeek)

resp <- GET(url=base_url, query=query_params)

resp

body_tags <- read_html(resp) %>% 
  html_nodes('body') %>% 
  html_text() %>% 
  toString() # to produce a single character string describing an R object.

# str_match_all - Extract matched groups from a string.
# output - a list of character matrices
# search window Server data for all items
tmp <- str_match_all(body_tags,'window.__SERVER_DATA__ = (.*?);')  

# Convert R objects from JSON - output - list
json <- jsonlite::fromJSON(tmp[[1]][,2])
str(json)

Tickets <- json$suggestions$events$ticket_availability

这是数据框“Tickets”的输入

structure(list(is_sold_out = c(FALSE, TRUE, FALSE, FALSE, FALSE, 
FALSE, FALSE, TRUE, FALSE, FALSE), has_available_tickets = c(TRUE, 
FALSE, TRUE, TRUE, TRUE, TRUE, TRUE, FALSE, TRUE, TRUE), minimum_ticket_price = structure(list(
    currency = c("EUR", "USD", "PLN", "PLN", "USD", "USD", "USD", 
    "USD", "CAD", "USD"), value = c(25946L, 28880L, 25530L, 4900L, 
    4406L, 0L, 28000L, 0L, 1000L, 28000L), major_value = c("259.46", 
    "288.80", "255.30", "49.00", "44.06", "0.00", "280.00", "0.00", 
    "10.00", "280.00"), display = c("259.46 EUR", "288.80 USD", 
    "255.30 PLN", "49.00 PLN", "44.06 USD", "0.00 USD", "280.00 USD", 
    "0.00 USD", "10.00 CAD", "280.00 USD")), class = "data.frame", row.names = c(NA, 
10L)), is_free = c(FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, 
FALSE, TRUE, FALSE, FALSE), maximum_ticket_price = structure(list(
    currency = c("EUR", "USD", "PLN", "PLN", "USD", "USD", "USD", 
    "USD", "CAD", "USD"), value = c(49077L, 36702L, 25530L, 9911L, 
    4406L, 15684L, 28000L, 0L, 1000L, 28000L), major_value = c("490.77", 
    "367.02", "255.30", "99.11", "44.06", "156.84", "280.00", 
    "0.00", "10.00", "280.00"), display = c("490.77 EUR", "367.02 USD", 
    "255.30 PLN", "99.11 PLN", "44.06 USD", "156.84 USD", "280.00 USD", 
    "0.00 USD", "10.00 CAD", "280.00 USD")), class = "data.frame", row.names = c(NA, 
10L))), class = "data.frame", row.names = c(NA, 10L))

我想展平minimum_ticket_pricemaximum_ticket_price创建数据框Tickets

标签: rjson

解决方案


尝试在导入时展平,但请注意,新列名都将以“ticket_availability”开头。而不是嵌套在单个列下。

json <- jsonlite::fromJSON(tmp[[1]][,2], flatten = TRUE)

Tickets_flat <- json$suggestions$events %>% 
  select(starts_with("ticket_availability"))

或使用jsonlite::flatten命令展平(注意purrr还有一个flatten不能以相同方式工作的命令)

Tickets_flat <- jsonlite::flatten(Tickets)

另请参阅https://stackoverflow.com/a/35497845/4241780了解更多信息。


推荐阅读