首页 > 解决方案 > 将变异应用于 R 中的数据帧:重复一个额外的步骤并导致错误

问题描述

我想知道为什么我的 get_http_status 函数会重复多次导致异常

我有一个数据框,例如:

> str(df5)
'data.frame':   10 obs. of  3 variables:
 $ text : chr  "\n" "\n" "\n" "\n" ...
 $ enlace: chr  "//www.blogger.com| __truncated__ ...
 $ Freq  : int  1 1 1 1 1 1 1 1 1 r code here

我正在尝试使用此函数获取每个“enlace”的 http 状态代码:

get_http_status <- function(url){
  if (!is.null(url)){
    Sys.sleep(3)
    print(url)
    ret <- HEAD(url)
    return(ret$status_code)
  }
  return("")
}


df44 <- mutate(df5, status = get_http_status(enlace))

但不断抛出错误:

** Error in parse_url(url) : length(url) == 1 is not TRUE**

我可以用 try/catch 扭曲函数并且它可以工作,但我不知道为什么首先会发生错误。

get_http_status_2 <- function(url){
  tryCatch(
    expr = {
      Sys.sleep(3)
      print(url)
      ret <- HEAD(url)
      return(ret$status_code)
    },
    error = function(e){ 
      return("")
    }
  )
}

df5$enlace 的内容是:

> df5$enlace
 [1] "//www.blogger.com/rearrange?blogID=4514563088285989046&widgetType=Attribution&widgetId=Attribution1&action=editWidget&sectionId=footer-3"       
 [2] "//www.blogger.com/rearrange?blogID=4514563088285989046&widgetType=BlogArchive&widgetId=BlogArchive1&action=editWidget&sectionId=sidebar-right-1"
 [3] "//www.blogger.com/rearrange?blogID=4514563088285989046&widgetType=BlogSearch&widgetId=BlogSearch1&action=editWidget&sectionId=sidebar-right-1"  
 [4] "//www.blogger.com/rearrange?blogID=4514563088285989046&widgetType=Followers&widgetId=Followers1&action=editWidget&sectionId=sidebar-right-1"    
 [5] "//www.blogger.com/rearrange?blogID=4514563088285989046&widgetType=PageList&widgetId=PageList1&action=editWidget&sectionId=crosscol"             
 [6] "//www.blogger.com/rearrange?blogID=4514563088285989046&widgetType=Text&widgetId=Text1&action=editWidget&sectionId=sidebar-right-1"              
 [7] "//www.blogger.com/rearrange?blogID=4514563088285989046&widgetType=Text&widgetId=Text2&action=editWidget&sectionId=sidebar-right-1"              
 [8] "http://5d4a.wordpress.com/2010/08/02/smashing-the-stack-in-2010/"                                                                               
 [9] "http://advancedwindowsdebugging.com/ch06.pdf"                                                                                                   
[10] "http://beej.us/guide/

我认为它会再迭代一次,因为函数的结果是:

> df44 <- mutate(df5, status = get_http_status(enlace))
 [1] "//www.blogger.com/rearrange?blogID=4514563088285989046&widgetType=Attribution&widgetId=Attribution1&action=editWidget&sectionId=footer-3"       
 [2] "//www.blogger.com/rearrange?blogID=4514563088285989046&widgetType=BlogArchive&widgetId=BlogArchive1&action=editWidget&sectionId=sidebar-right-1"
 [3] "//www.blogger.com/rearrange?blogID=4514563088285989046&widgetType=BlogSearch&widgetId=BlogSearch1&action=editWidget&sectionId=sidebar-right-1"  
 [4] "//www.blogger.com/rearrange?blogID=4514563088285989046&widgetType=Followers&widgetId=Followers1&action=editWidget&sectionId=sidebar-right-1"    
 [5] "//www.blogger.com/rearrange?blogID=4514563088285989046&widgetType=PageList&widgetId=PageList1&action=editWidget&sectionId=crosscol"             
 [6] "//www.blogger.com/rearrange?blogID=4514563088285989046&widgetType=Text&widgetId=Text1&action=editWidget&sectionId=sidebar-right-1"              
 [7] "//www.blogger.com/rearrange?blogID=4514563088285989046&widgetType=Text&widgetId=Text2&action=editWidget&sectionId=sidebar-right-1"              
 [8] "http://5d4a.wordpress.com/2010/08/02/smashing-the-stack-in-2010/"                                                                               
 [9] "http://advancedwindowsdebugging.com/ch06.pdf"                                                                                                   
[10] "http://beej.us/guide/bgc/"                                                                                                                      
 Error in parse_url(url) : length(url) == 1 is not TRUE 

标签: rdataframedplyr

解决方案


由于您的函数包含一个未向量化的函数,因此请使用apply高阶函数族来迭代您的向量。

下面,get_http_status将在 的每个元素上调用df$enlace

对于每个调用,预期返回一个长度为一个字符向量character(1)

vapply(df5$enlace, get_http_status, character(1)) 

推荐阅读