首页 > 解决方案 > 使用 RSelenium 在 Fangraphs 中抓取动态数据

问题描述

我正在尝试自动从 fangraphs 下载动态数据。目标是模拟单击下载 csv 的“导出数据”链接。或者,如果可能的话,简单地抓取数据。两种获取数据的方法对我来说都很好。我在 RSelenium 中尝试了以下代码无济于事:

remDr <- remoteDriver(port = 4445L,
            browserName = 'firefox')
remDr$open()
remDr$navigate("https://www.fangraphs.com/leaders/splits-leaderboards?splitArr=1,7&splitArrPitch=&position=B&autoPt=false&splitTeams=false&statType=player&statgroup=1&startDate=2017-03-01&endDate=2021-11-01&players=&filter=&groupBy=career&sort=-1,1")
remDr$findElement(using = "class", value = "data-export")

此 URL 的数据是使用 javascript 呈现的。尽管您可以检查页面上的元素并查看 class = "data-export" 元素,但 remDr$findElement 返回错误:

Selenium message:Unable to locate element: {"method":"class name","selector":"data-export"}
For documentation on this error, please visit: http://seleniumhq.org/exceptions/no_such_element.html
Build info: version: '2.53.1', revision: 'a36b8b1', time: '2016-06-30 17:37:03'
System info: host: '7d65bd0674cb', ip: '172.17.0.2', os.name: 'Linux', os.arch: 'amd64', os.version: '5.4.0-80-generic', java.version: '1.8.0_91'
Driver info: driver.version: unknown

Error:   Summary: NoSuchElement
     Detail: An element could not be located on the page using the given search parameters.
     class: org.openqa.selenium.NoSuchElementException
     Further Details: run errorDetails method

标签: rweb-scrapingrselenium

解决方案


而不是class尝试xpath

library(RSelenium)
driver = rsDriver(port = 4941L, browser = c("firefox"))
remDr <- driver[["client"]]
remDr$navigate("https://www.fangraphs.com/leaders/splits-leaderboards?splitArr=1,7&splitArrPitch=&position=B&autoPt=false&splitTeams=false&statType=player&statgroup=1&startDate=2017-03-01&endDate=2021-11-01&players=&filter=&groupBy=career&sort=-1,1")

#Click Exporrt Data 

remDr$findElement(using = "xpath",'//*[@id="react-drop-test"]/div[2]/a') -> downloaddata
downloaddata$clickElement()

推荐阅读