首页 > 解决方案 > 网页抓取时的 seleniumR 和 Purr 错误

问题描述

我正在使用 rSelenium 在carsales.com.au. 这map_df部分代码在过去运行得非常好,可以轻松处理空字段。但是这个网站抛出了以下错误:

Error: Internal error in `vec_assign()`: `value` should have been recycled to fit `x`.

我已经对此错误进行了一些研究,但它超出了我的范围。

这是代码:

library(tidyverse)
library(rvest)
library(RSelenium)


#navigate to homepage and get html
rD <- RSelenium::rsDriver(browser="firefox", port= 4845L)
remDr <- rD[["client"]]

# - Manually Calculate pages (website uses an offset of 12 per page)
pages <- seq(from = 0, to = 770, by = 12)
cars <- tibble()

for (i in pages) {
#create URLs to loop over
  url <- 'https://www.carsales.com.au/cars/ford/territory/'
  url <- print(paste0(url,'?offset=', i))

  #Navigate to URLs
  remDr$navigate(url)
  soup <- remDr$getPageSource()
  soup <- xml2::read_html(soup[[1]])
  
  data <- soup %>% 
    html_nodes('div.listing-wrapper') %>% 
    map_df(~list(Model = html_nodes(.x, 'div.col > h3') %>% 
                   html_text() %>% 
                   {if(length(.) == 0) NA else .},
                 Price = html_nodes(.x, '.price > a') %>% 
                   html_text() %>% 
                   {if(length(.) == 0) NA else .},
                 Deets = html_nodes(.x, '.key-details') %>%
                   html_text() %>% 
                   {if(length(.) == 0) NA  else .},
                 sellerType = html_nodes(.x, '.seller-type' ) %>% 
                   html_text() %>% 
                   {if(length(.) == 0) NA else .},
                 sellerLoc = html_nodes(.x, '.seller-location') %>% 
                   html_text() %>% 
                   {if(length(.) == 0) NA else .}
                )
          )
  cars <- as_tibble(rbind(cars, data))
  }

我感谢您对此项目的任何帮助。

干杯

标签: rseleniumpurrrrvest

解决方案


推荐阅读