首页 > 解决方案 > 网页抓取有奇怪的日期格式,需要转换。“charToDate(x) 中的错误:字符串不是标准的明确格式”

问题描述

我正试图每天抓取这个网站。https://www.basketball-reference.com/boxscores/?month=9&day=30&year=2020 如您所见,日期格式不正确,我不知道如何让 R 将其识别为一个约会。

到目前为止,这是我的代码:

url <- "https://www.basketball-reference.com/boxscores/"

timevalues <- seq(as.Date("month=10&day=2&year=2020"), as.Date("month=10&day=2&year=2020"), by = "day")
head(timevalues)

charToDate(x) 中的错误:字符串不是标准的明确格式

标签: rweb-scrapingas.date

解决方案


您不能像这样生成序列(或日期)。这是使用 lubridate 包的解决方案

library(lubridate)

url <- "https://www.basketball-reference.com/boxscores/"

my_dates <- seq(as.Date("2020-09-25"), as.Date("2020-10-05"), by = "day")

urls <- paste0(url, 
               "?month=", month(my_dates),
               "&day=", day(my_dates), 
               "&year=", year(my_dates))

urls
#>  [1] "https://www.basketball-reference.com/boxscores/?month=9&day=25&year=2020"
#>  [2] "https://www.basketball-reference.com/boxscores/?month=9&day=26&year=2020"
#>  [3] "https://www.basketball-reference.com/boxscores/?month=9&day=27&year=2020"
#>  [4] "https://www.basketball-reference.com/boxscores/?month=9&day=28&year=2020"
#>  [5] "https://www.basketball-reference.com/boxscores/?month=9&day=29&year=2020"
#>  [6] "https://www.basketball-reference.com/boxscores/?month=9&day=30&year=2020"
#>  [7] "https://www.basketball-reference.com/boxscores/?month=10&day=1&year=2020"
#>  [8] "https://www.basketball-reference.com/boxscores/?month=10&day=2&year=2020"
#>  [9] "https://www.basketball-reference.com/boxscores/?month=10&day=3&year=2020"
#> [10] "https://www.basketball-reference.com/boxscores/?month=10&day=4&year=2020"
#> [11] "https://www.basketball-reference.com/boxscores/?month=10&day=5&year=2020"

推荐阅读