首页 > 解决方案 > 从java图表和下拉菜单中抓取

问题描述

我试图从 https://www.snowyhydro.com.au/our-energy/water/storages/lake-levels-calculator/抓取数据 我试图通过使用下拉菜单进一步抓取不同年份的湖水位由 R.Ar 的那一刻,我正在努力从哪里开始,因为我已经在网上搜索了各种代码,并且我无法获得关于如何获得不同湖泊的年度值的起点,并且我正在使用 R

我试图在这里使用选择器小工具,但它不起作用,因为我认为图表是基于 Java 的

library('rvest')

url <- 'https://www.snowyhydro.com.au/our-energy/water/storages/lake-levels-calculator/'
webpage <- read_html(url)

我正在寻找所有湖泊每日存储水平的表格结果。

标签: rparsingweb-scraping

解决方案


我能够找到一个更好的 url 用于请求数据:"https://www.snowyhydro.com.au/wp-content/themes/basic/get_dataxml.php

请求的 JSON 响应没有被清楚地解释为表格,但我认为这里的函数应该为您完成:

library(httr)
library(jsonlite)

# This function is called from within the other to convert each day 
# to its own dataframe, creating extra columns for the year, month, and day
entry.to.row <- function(entry) {
  date = entry[["-date"]]
  entry.df = data.frame(
    matrix(unlist(entry$lake), nrow=length(entry$lake), byrow = T), 
    stringsAsFactors = F
  )
  colnames(entry.df) = c("LakeName", "Date","Measurement")
  entry.df$Date = date

  date.split = strsplit(date, split = "-")[[1]]
  entry.df$Year = date.split[1]
  entry.df$Month = date.split[2]
  entry.df$Day = date.split[3]
  entry.df
}

# Fetch the data for two years and convert them into two data.frames which 
# we will then merge into a single data.frame
fetch.data <- function(
  base.url = "https://www.snowyhydro.com.au/wp-content/themes/basic/get_dataxml.php",
  current,
  past
) {
  fetched = httr::POST(
    url = base.url, 
    body = list("year_current"=current, "year_pass"=past)
  )

  datJSON = fromJSON(content(fetched, as = "text"), simplifyVector = F)

  pastJSON = datJSON$year_pass$snowyhydro$level
  pastEntries = do.call("rbind", lapply(pastJSON, entry.to.row))

  currentJSON = datJSON$year_current$snowyhydro$level
  currentEntries = do.call("rbind", lapply(currentJSON, entry.to.row))

  rbind(pastEntries, currentEntries)
}

# Fetch the data for 2019 and 2018
dat = fetch.data(current=2019, past=2018)

> head(dat)
              LakeName       Date Measurement Year Month Day
1       Lake Eucumbene 2018-01-01       46.40 2018    01  01
2       Lake Jindabyne 2018-01-01       85.80 2018    01  01
3 Tantangara Reservoir 2018-01-01       42.94 2018    01  01
4       Lake Eucumbene 2018-01-02       46.41 2018    01  02
5       Lake Jindabyne 2018-01-02       85.72 2018    01  02
6 Tantangara Reservoir 2018-01-02       42.98 2018    01  02

推荐阅读