r - 网页抓取有奇怪的日期格式,需要转换。“charToDate(x) 中的错误:字符串不是标准的明确格式”
问题描述
我正试图每天抓取这个网站。https://www.basketball-reference.com/boxscores/?month=9&day=30&year=2020 如您所见,日期格式不正确,我不知道如何让 R 将其识别为一个约会。
到目前为止,这是我的代码:
url <- "https://www.basketball-reference.com/boxscores/"
timevalues <- seq(as.Date("month=10&day=2&year=2020"), as.Date("month=10&day=2&year=2020"), by = "day")
head(timevalues)
charToDate(x) 中的错误:字符串不是标准的明确格式
解决方案
您不能像这样生成序列(或日期)。这是使用 lubridate 包的解决方案
library(lubridate)
url <- "https://www.basketball-reference.com/boxscores/"
my_dates <- seq(as.Date("2020-09-25"), as.Date("2020-10-05"), by = "day")
urls <- paste0(url,
"?month=", month(my_dates),
"&day=", day(my_dates),
"&year=", year(my_dates))
urls
#> [1] "https://www.basketball-reference.com/boxscores/?month=9&day=25&year=2020"
#> [2] "https://www.basketball-reference.com/boxscores/?month=9&day=26&year=2020"
#> [3] "https://www.basketball-reference.com/boxscores/?month=9&day=27&year=2020"
#> [4] "https://www.basketball-reference.com/boxscores/?month=9&day=28&year=2020"
#> [5] "https://www.basketball-reference.com/boxscores/?month=9&day=29&year=2020"
#> [6] "https://www.basketball-reference.com/boxscores/?month=9&day=30&year=2020"
#> [7] "https://www.basketball-reference.com/boxscores/?month=10&day=1&year=2020"
#> [8] "https://www.basketball-reference.com/boxscores/?month=10&day=2&year=2020"
#> [9] "https://www.basketball-reference.com/boxscores/?month=10&day=3&year=2020"
#> [10] "https://www.basketball-reference.com/boxscores/?month=10&day=4&year=2020"
#> [11] "https://www.basketball-reference.com/boxscores/?month=10&day=5&year=2020"
推荐阅读
- django - 用户如何将文件上传到 Django 模型中?
- css - Nuxtjs:Scss 部分未导入页面
- javascript - 将 javascript 函数转换为方法
- c++ - 您是否应该始终初始化 C++ std::vectors?
- wordpress - 在 Sage 上找不到与模式 .s?(c|a)ss" 匹配的文件
- python - 同时实现两个内部 Python 类型
- ffmpeg - 如何使用 ffmpeg 从 ts 中提取“GROUP-ID”、“LANGUAGE”和“INSTREAM-ID”
- docker - 两个 .net 核心容器之间的通信
- json - 解组具有多个 json 对象的 json 文件(无效的 json 文件)
- jupyter-notebook - Ipyvuetify 多选功能不起作用