r - 如何在 r 中将 json 对象展平为数据框
问题描述
我正在尝试将复杂的 json 对象展平json$suggestions$events$ticket_availability
为数据框“票证”。
我尝试了各种方法,包括:
fromJSON((tmp[[1]][,2]), flatten=TRUE)
tmp %>%
map(~ fromJSON(.x)) %>%
bind_rows()
到 stackoverflow.com/questions/11553592/…。似乎没有一个对我有用。
library(httr)
library(rvest)
library(dplyr)
library(magrittr)
library(stringr)
library(lubridate)
library(purrr)
library(jsonlite)
getYear = "2019"
getWeek = "31"
base_url = "https://www.eventbrite.com/d/poland--pozna%C5%84/conference/"
query_params = list(yr=getYear, wk=getWeek)
resp <- GET(url=base_url, query=query_params)
resp
body_tags <- read_html(resp) %>%
html_nodes('body') %>%
html_text() %>%
toString() # to produce a single character string describing an R object.
# str_match_all - Extract matched groups from a string.
# output - a list of character matrices
# search window Server data for all items
tmp <- str_match_all(body_tags,'window.__SERVER_DATA__ = (.*?);')
# Convert R objects from JSON - output - list
json <- jsonlite::fromJSON(tmp[[1]][,2])
str(json)
Tickets <- json$suggestions$events$ticket_availability
这是数据框“Tickets”的输入
structure(list(is_sold_out = c(FALSE, TRUE, FALSE, FALSE, FALSE,
FALSE, FALSE, TRUE, FALSE, FALSE), has_available_tickets = c(TRUE,
FALSE, TRUE, TRUE, TRUE, TRUE, TRUE, FALSE, TRUE, TRUE), minimum_ticket_price = structure(list(
currency = c("EUR", "USD", "PLN", "PLN", "USD", "USD", "USD",
"USD", "CAD", "USD"), value = c(25946L, 28880L, 25530L, 4900L,
4406L, 0L, 28000L, 0L, 1000L, 28000L), major_value = c("259.46",
"288.80", "255.30", "49.00", "44.06", "0.00", "280.00", "0.00",
"10.00", "280.00"), display = c("259.46 EUR", "288.80 USD",
"255.30 PLN", "49.00 PLN", "44.06 USD", "0.00 USD", "280.00 USD",
"0.00 USD", "10.00 CAD", "280.00 USD")), class = "data.frame", row.names = c(NA,
10L)), is_free = c(FALSE, FALSE, FALSE, FALSE, FALSE, FALSE,
FALSE, TRUE, FALSE, FALSE), maximum_ticket_price = structure(list(
currency = c("EUR", "USD", "PLN", "PLN", "USD", "USD", "USD",
"USD", "CAD", "USD"), value = c(49077L, 36702L, 25530L, 9911L,
4406L, 15684L, 28000L, 0L, 1000L, 28000L), major_value = c("490.77",
"367.02", "255.30", "99.11", "44.06", "156.84", "280.00",
"0.00", "10.00", "280.00"), display = c("490.77 EUR", "367.02 USD",
"255.30 PLN", "99.11 PLN", "44.06 USD", "156.84 USD", "280.00 USD",
"0.00 USD", "10.00 CAD", "280.00 USD")), class = "data.frame", row.names = c(NA,
10L))), class = "data.frame", row.names = c(NA, 10L))
我想展平minimum_ticket_price
并maximum_ticket_price
创建数据框Tickets
。
解决方案
尝试在导入时展平,但请注意,新列名都将以“ticket_availability”开头。而不是嵌套在单个列下。
json <- jsonlite::fromJSON(tmp[[1]][,2], flatten = TRUE)
Tickets_flat <- json$suggestions$events %>%
select(starts_with("ticket_availability"))
或使用jsonlite::flatten
命令展平(注意purrr
还有一个flatten
不能以相同方式工作的命令)
Tickets_flat <- jsonlite::flatten(Tickets)
另请参阅https://stackoverflow.com/a/35497845/4241780了解更多信息。
推荐阅读
- html - 如何摆脱包含 iframe 的 SPFX 选项卡中的双滚动条
- java - 如何使位于其自己文件中的私有节点类对我的甲板类(LINKEDLIST)可见?
- encryption - 如何通过 ES265 算法创建 JSON Web 令牌?
- html - 如何更改 Bootstrap 自定义文件输入类的背景颜色
- database - ERD 到关系模式:如何转换这种自引用的 m 对 m 关系?
- javascript - 如果时间在时间戳范围内,则增加计数
- node.js - 替换文档不得包含原子运算符
- python - 我将 pdf 转换为 .txt 有时什么也不返回
- python - 时间为 00:00 时,熊猫读取 excel 返回类型对象
- r - 在 R 中的矩阵中查找模式?