首页 > 解决方案 > checking website status of multiple urls using lapply- error handling

问题描述

I want to check the status of multiple websites (appr. 8,000).

I found this great example: checking validity of a list og urls using GET

This solution uses this code: 
websites <- read.table(header=T, text="website
1   www.msn.com
2   www.wazl.com
3  www.amazon.com
4 www.rifapro.com")
library(httr)
urls <- paste0(ifelse(grepl("^https?://", websites$website, ig=T), "", "http://"),
          websites$website)
lst <- lapply(unique(tolower(urls)), function(url) try(HEAD(url), silent = T))
names(lst) <- urls
sapply(lst, function(x) if (inherits(x, "try-error")) -999 else status_code(x))

However, for some websites this takes ages, e.g. if I use http://www.chewy.com instead of www.wazl.com.

Here are some example links which work fine if I exclude the last chewy-link:

501 http://www.bedrockdata.com
502 http://www.beecroftortho.com
503 http://secure.beenevaluated.com
504 http://beercitizen.com
951 http://www.chewy.com

Why is that? My first idea is to use a timeout function but I don't know where to implement it. I do not get any error message, the code just won't finish and I'll have to restart the R session.

标签: rurllapply

解决方案


推荐阅读