php - 如何使用 R 从 PHP 网站上抓取表格？

问题描述

希望从此页面上的表中将数据导入 R：

https://legacy.baseballprospectus.com/standings/index.php?odate=2019-09-10

我尝试了多种使用 XML 和 httr 的方法，但都没有成功。已经看过过去的帖子，包括：

使用 R 从 php 网站读取数据

和

使用 XML 包将 html 表抓取到 R 数据帧中

想知道我是否没有使用源中的正确表 ID，或者考虑到我当前使用的工具，表的格式是否不正确？

非常感谢任何和所有帮助！提前致谢！

标签： phprfunctionweb-scraping

这不会给你你想要的，但它可能会帮助你开始：

library(XML)
fname <- "standings20190910.html"
download.file("https://legacy.baseballprospectus.com/standings/index.php?odate=2019-09-10", destfile=fname)
doc0 <- htmlParse(file=fname, encoding="UTF-8")
doc1 <- xmlRoot(doc0)
doc2 <- getNodeSet(doc1, "//table[@id='content']")
standings <- readHTMLTable(doc2[[1]], header=TRUE, skip.rows=1, stringsAsFactors=FALSE)

您可以查看您尝试抓取的表格的 HTML 源代码，然后尝试找出如何创建有用的 R 对象。仔细查看 XML 包的文档getNodeSet和readHTMLTable手册 ( https://cran.r-project.org/web/packages/XML/XML.pdf )。

php - 如何使用 R 从 PHP 网站上抓取表格？

问题描述

解决方案

推荐阅读