r - 网页抓取:R rvest html_nodes 返回字符(0)
问题描述
下面是代码。奇怪的是我昨天可以毫无问题地运行它,但今天总是返回字符(0)。查了一下,发现是html_nodes
线路的原因。
我尝试用其他节点替换“.photo-cards li article”,它仍然无法正常工作。
有人遇到同样的问题并解决了吗?预先感谢您的帮助!
library(tidyverse)
library(rvest)
links <- sprintf("https://www.zillow.com/sacramento-ca/%d_p", 1:11)
results <- map(links, ~ {
# http://selectorgadget.com/
# <body
# class="photo-cards
houses <- read_html(.x) %>%
html_nodes(".photo-cards li article")
z_id <- houses %>%
html_attr("id")
address <- houses %>%
html_node(".list-card-addr") %>%
html_text()
price <- houses %>%
html_node(".list-card-price") %>%
html_text() %>%
readr::parse_number()
params <- houses %>%
html_node(".list-card-info") %>%
html_text()
# number of bedrooms
beds <- params %>%
str_extract("\\d+(?=\\s*bds)") %>%
as.numeric()
# number of bathrooms
baths <- params %>%
str_extract("\\d+(?=\\s*ba)") %>%
as.numeric()
# total square footage
house_a <- params %>%
str_extract("[0-9,]+(?=\\s*sqft)") %>%
str_replace(",", "") %>%
as.numeric()
tibble(price = price, beds= beds, baths=baths, house_area = house_a)
}
) %>%
bind_rows(.id = 'page_no')
解决方案
推荐阅读
- azure - Blob 触发函数被触发两次
- tableau-api - How to pass a vector from tableau to R
- python-3.x - 我怎样才能改进数字预测?
- python - 需要帮助解析一段 HTML
- javascript - JavaScript RegExp:匹配所有特定字符,忽略嵌套括号
- python - Python中RuntimeException和Exception的区别
- hyperledger - 没有来自任何同行的有效响应
- php - 权限被拒绝 Nginx Docker
- reactjs - 如何在渲染中显示此数组中的一项?
- amazon-web-services - 防止 AWS cloudformation 中的回滚