r - checking website status of multiple urls using lapply- error handling
问题描述
I want to check the status of multiple websites (appr. 8,000).
I found this great example: checking validity of a list og urls using GET
This solution uses this code:
websites <- read.table(header=T, text="website
1 www.msn.com
2 www.wazl.com
3 www.amazon.com
4 www.rifapro.com")
library(httr)
urls <- paste0(ifelse(grepl("^https?://", websites$website, ig=T), "", "http://"),
websites$website)
lst <- lapply(unique(tolower(urls)), function(url) try(HEAD(url), silent = T))
names(lst) <- urls
sapply(lst, function(x) if (inherits(x, "try-error")) -999 else status_code(x))
However, for some websites this takes ages, e.g. if I use http://www.chewy.com instead of www.wazl.com.
Here are some example links which work fine if I exclude the last chewy-link:
501 http://www.bedrockdata.com
502 http://www.beecroftortho.com
503 http://secure.beenevaluated.com
504 http://beercitizen.com
951 http://www.chewy.com
Why is that? My first idea is to use a timeout function but I don't know where to implement it. I do not get any error message, the code just won't finish and I'll have to restart the R session.
解决方案
推荐阅读
- python - 使用 Python 请求从西南请求数据时出现超时错误
- javascript - 表将无法使用 Firebase 生成
- php - 从 JSON 中提取数据并显示在 SWAL 警报中
- sockets - Raw socket UDP programming
- react-native - GraphQL Subscription Authentication Using AsyncStorage
- java - java.security.InvalidKeyException: Keystore operation failed
- ssl - EACCESS Permission denied on SSL certificate files when starting Phoenix production server
- python - Anaconda2 安装错误/$PATH
- linux-kernel - 将新功能移植到旧内核版本
- polygon - 如何在 geopandas 中获取 shapefile 多边形的 UTM 区域