r - 从java图表和下拉菜单中抓取
问题描述
我试图从 https://www.snowyhydro.com.au/our-energy/water/storages/lake-levels-calculator/抓取数据 我试图通过使用下拉菜单进一步抓取不同年份的湖水位由 R.Ar 的那一刻,我正在努力从哪里开始,因为我已经在网上搜索了各种代码,并且我无法获得关于如何获得不同湖泊的年度值的起点,并且我正在使用 R
我试图在这里使用选择器小工具,但它不起作用,因为我认为图表是基于 Java 的
library('rvest')
url <- 'https://www.snowyhydro.com.au/our-energy/water/storages/lake-levels-calculator/'
webpage <- read_html(url)
我正在寻找所有湖泊每日存储水平的表格结果。
解决方案
我能够找到一个更好的 url 用于请求数据:"https://www.snowyhydro.com.au/wp-content/themes/basic/get_dataxml.php
请求的 JSON 响应没有被清楚地解释为表格,但我认为这里的函数应该为您完成:
library(httr)
library(jsonlite)
# This function is called from within the other to convert each day
# to its own dataframe, creating extra columns for the year, month, and day
entry.to.row <- function(entry) {
date = entry[["-date"]]
entry.df = data.frame(
matrix(unlist(entry$lake), nrow=length(entry$lake), byrow = T),
stringsAsFactors = F
)
colnames(entry.df) = c("LakeName", "Date","Measurement")
entry.df$Date = date
date.split = strsplit(date, split = "-")[[1]]
entry.df$Year = date.split[1]
entry.df$Month = date.split[2]
entry.df$Day = date.split[3]
entry.df
}
# Fetch the data for two years and convert them into two data.frames which
# we will then merge into a single data.frame
fetch.data <- function(
base.url = "https://www.snowyhydro.com.au/wp-content/themes/basic/get_dataxml.php",
current,
past
) {
fetched = httr::POST(
url = base.url,
body = list("year_current"=current, "year_pass"=past)
)
datJSON = fromJSON(content(fetched, as = "text"), simplifyVector = F)
pastJSON = datJSON$year_pass$snowyhydro$level
pastEntries = do.call("rbind", lapply(pastJSON, entry.to.row))
currentJSON = datJSON$year_current$snowyhydro$level
currentEntries = do.call("rbind", lapply(currentJSON, entry.to.row))
rbind(pastEntries, currentEntries)
}
# Fetch the data for 2019 and 2018
dat = fetch.data(current=2019, past=2018)
> head(dat)
LakeName Date Measurement Year Month Day
1 Lake Eucumbene 2018-01-01 46.40 2018 01 01
2 Lake Jindabyne 2018-01-01 85.80 2018 01 01
3 Tantangara Reservoir 2018-01-01 42.94 2018 01 01
4 Lake Eucumbene 2018-01-02 46.41 2018 01 02
5 Lake Jindabyne 2018-01-02 85.72 2018 01 02
6 Tantangara Reservoir 2018-01-02 42.98 2018 01 02
推荐阅读
- groovy - Groovy Rest 客户端:解析“应用程序/json”时出错
- javascript - 如何解析字符串 Unirest 响应
- c# - 如何从 C# 访问谷歌云存储桶
- mysql - MySQL - 链接多个表的建议 - 这是对的吗?
- docker - 删除失败节点后 Docker Swarm 中的孤立任务
- php - PHP包含来自其他文件夹的文件
- powerapps - 将数据保存为文本
- javascript - JQuery - 全页页面滚动功能
- node.js - delivery.js:从服务器到客户端的文件传输不起作用
- iphone - 如何在不连接到 PC 的情况下获取我的设备 (ios 12) 的 UDID