r - 抓取脚本在同一站点的其他页面上工作时返回错误,例如“下标越界”和“找不到对象”
问题描述
我在修改这个工作脚本时遇到了问题,该脚本从 fangraphs 中抓取数据,到同一站点上的不同页面以获取小联盟数据。我更改了 URL,删除了有关替换百分比的部分,因为它们在我的特定页面上没有问题......
这是原始脚本
suppressMessages(library(dplyr))
suppressMessages(library(rvest))
### Load data from webpage
url <- "https://www.fangraphs.com/leaders.aspx?pos=all&stats=bat&lg=all&qual=30&type=2&season=2018&month=0&season1=2018&ind=0&team=0&rost=0&age=0&filter=&players=0&page=1_1000"
l1 <- read_html(url)
l1 <- html_nodes(l1, 'table')
### Extract table from html and remove 'bad' rows
fangraphs <- html_table(l1, fill = TRUE)[[12]]
fangraphs <- fangraphs[-c(1,3),]
# Extract column names
columnNames <- as.list(fangraphs[1,])
# Take care of symbols in column names
columnNames <- gsub("%", ".p", columnNames)
columnNames <- gsub("/", "per", columnNames)
# Rename data frame and remove row with column names
colnames(fangraphs) <- columnNames
fangraphs <- fangraphs[-1,]
fangraphs[] <- sapply(fangraphs, function(x) gsub(" %","",x))
fangraphs[4:19] <- sapply(fangraphs[4:19],as.numeric)
suppressMessages(library(dplyr))
suppressMessages(library(rvest))
### Load data from webpage
url <- "https://www.fangraphs.com/minorleaders.aspx?pos=all&stats=bat&lg=all&qual=0&type=c,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,30,46,45,44,32,23&season=2018&team=0&players=&page=1_3000"
l1 <- read_html(url)
l1 <- html_nodes(l1, 'table')
### Extract table from html and remove 'bad' rows
fangraphs <- html_table(l1, fill = TRUE)[[12]]
fangraphs <- fangraphs[-c(1,3),]
# Extract column names
columnNames <- as.list(fangraphs[1,])
# Rename data frame and remove row with column names
colnames(fangraphs) <- columnNames
fangraphs <- fangraphs[-1,]
fangraphs[3:26] <- sapply(fangraphs[3:26],as.numeric)
我收到此错误返回
> suppressMessages(library(dplyr))
> suppressMessages(library(rvest))
>
> ### Load data from webpage
>
> url <- "https://www.fangraphs.com/minorleaders.aspx?pos=all&stats=bat&lg=all&qual=0&type=c,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,30,46,45,44,32,23&season=2018&team=0&players=&page=1_3000"
>
> l1 <- read_html(url)
> l1 <- html_nodes(l1, 'table')
>
> ### Extract table from html and remove 'bad' rows
> fangraphs <- html_table(l1, fill = TRUE)[[12]]
Error in html_table(l1, fill = TRUE)[[12]] : subscript out of bounds
> fangraphs <- fangraphs[-c(1,3),]
Error: object 'fangraphs' not found
>
> # Extract column names
> columnNames <- as.list(fangraphs[1,])
Error in as.list(fangraphs[1, ]) : object 'fangraphs' not found
>
> # Rename data frame and remove row with column names
> colnames(fangraphs) <- columnNames
Error in colnames(fangraphs) <- columnNames :
object 'fangraphs' not found
> fangraphs <- fangraphs[-1,]
Error: object 'fangraphs' not found
>
> fangraphs[3:26] <- sapply(fangraphs[3:26],as.numeric)
Error in lapply(X = X, FUN = FUN, ...) : object 'fangraphs' not found
当我将 html_nodes 中的代码从 'table' 更改为 '#MinorBoard1_dg1_ctl00 .rgHeader , .grid_line_regular' 时,它并没有好转,我使用选择器小工具(尽管这也包括每列的名称)。
最后一个单独的问题是我是否需要一些代码来修复具有“。”的列。在它们被转换为数字列之前(这里我说的是 ISO、BABIP 和 AVG 统计信息。
解决方案
推荐阅读
- python - 正则表达式匹配后跟空格或标点符号的单词
- google-cloud-functions - 谷歌云功能http认证
- android - Android底视图Activity片段的多个实例
- html - 通过ClassName定位后获取元素的XPath
- java - 如何打印排序数组索引而不是Java中的值?
- docker - 在提交时自动更新我的 package.json 是否会禁用 docker build 以重用缓存?
- linux - 如何计算两个文件之间的字符差异数?
- node.js - 在 Kubernetes 上部署带有前端 + 后端应用程序的 Docker 映像
- c++ - 我的 AM/PM 到 24H 时钟转换算法有什么问题?
- docker - 在 docker 容器中执行 composer 时内存不足