首页 > 解决方案 > Rvest 返回空值

问题描述

我正在尝试拼凑 rvest 的使用方式,我以为我已经知道了,但我收到的所有结果都是空的。

我正在使用@RonakShah 的示例(带有 rvest 的循环)作为我的基本示例,并认为我会尝试扩展以收集姓名、电话和每天开放的时间:

site = "https://concreteplayground.com/auckland/bars/archie-brothers-cirque-electriq"

get_phone <- function(url) {
  webpage <- site %>% read_html()
name <- webpage %>% html_nodes('p.name') %>%html_text() %>% trimws()
  telephone <- webpage %>% html_nodes('p.telephone') %>%html_text() %>% trimws()
  monday <- webpage %>% html_nodes('p.day a') %>%html_text() %>% trimws()
  tuesday <- webpage %>% html_nodes('p.day a') %>%html_text() %>% trimws()
  wednesday <- webpage %>% html_nodes('p.day a') %>%html_text() %>% trimws()
  thursday <- webpage %>% html_nodes('p.day a') %>%html_text() %>% trimws()
  friday <- webpage %>% html_nodes('p.day a') %>%html_text() %>% trimws()
  saturday <- webpage %>% html_nodes('p.day a') %>%html_text() %>% trimws()
  sunday <- webpage %>% html_nodes('p.day a') %>%html_text() %>% trimws()
  data.frame(telephone, monday, tuesday, wednesday, thursday, friday, saturday, sunday)
}

get_phone(site)

但我不能让其中任何一个单独工作?我什至无法让它读取当天或不正确的电话号码。有人能帮忙指出原因吗?

标签: rrvest

解决方案


右键单击网页,选择Inspect并检查网页的 HMTL。找到您要提取的元素并使用 CSS 选择器抓取它。

library(rvest)
site <- "https://concreteplayground.com/auckland/bars/archie-brothers-cirque-electriq"

get_phone <- function(url) {
  webpage <- site %>% read_html()
  phone <- webpage %>% html_nodes('span[itemprop="telephone"]') %>% html_text()
  opening_hours <- webpage %>% 
                    html_nodes('div.open-hours') %>% 
                    html_attr('data-times') %>% jsonlite::fromJSON()
  list(phone_number = phone, opening_hours = opening_hours)
}

get_phone(site)


#$phone_number
#[1] "+64 800 888 386"

#$opening_hours
#  weekday time_from time_to
#1       1     12:00   00:00
#2       2     12:00   00:00
#3       3     12:00   00:00
#4       4     12:00   00:00
#5       5     12:00   00:00
#6       6     10:00   00:00
#7       0     10:00   00:00

营业时间存储在一个 json 文件中,这很有帮助,因此我们不必单独抓取它们并将它们绑定在一起。


推荐阅读