首页 > 解决方案 > 导航到新链接 - R rvest

问题描述

我正在尝试按照网页上的链接访问下一个网页。我正在尝试提取有关 TN 中所有计划生育的信息,并从下面的网页开始

在此处输入图像描述

我想知道如何从这个网页开始并导航到诺克斯维尔健康中心的网页。我尝试使用带有以下内容的 rvest 包...

library(rvest)
library(dplyr)
URL <- paste0("https://www.plannedparenthood.org/health-center/tn")
Webpage <- read_html(URL)
Webpage %>% html_nodes("p")  

这给了我...

{xml_nodeset (6)}
[1] <p itemprop="name" data-facility-id="2610" data-affiliate-name="Planned Parenthood of Tennessee ...
[2] <p itemprop="name" data-facility-id="3348" data-affiliate-name="Planned Parenthood of Tennessee ...
[3] <p itemprop="name" data-facility-id="4247" data-affiliate-name="Planned Parenthood of Tennessee ...
[4] <p itemprop="name" data-facility-id="2716" data-affiliate-name="Planned Parenthood of Tennessee ...
[5] <p>Planned Parenthood delivers vital reproductive health care, sex education, and information t ...
[6] <p class="site-footer-legal">\n            <small>\n              © 2020 Planned Parenthood Fed ...

不知道在哪里可以超越这一点。可以使用任何帮助。

标签: rweb-scrapingrvest

解决方案


您可以使用以下方法获取网页的链接:

library(rvest)
URL <- "https://www.plannedparenthood.org/health-center/tn"
Webpage <- read_html(URL)

all_links <- Webpage %>% 
               html_nodes("p a") %>%
               html_attr('href') %>%
               paste0('https://www.plannedparenthood.org', .)
all_links
#[1] "https://www.plannedparenthood.org/health-center/tennessee/knoxville/37914/knoxville-health-center-2610-91550"                 
#[2] "https://www.plannedparenthood.org/health-center/tennessee/memphis/38112/memphis-health-center-midtown-3348-91550"             
#[3] "https://www.plannedparenthood.org/health-center/tennessee/memphis/38122/memphis-health-center-near-summer-and-i240-4247-91550"
#[4] "https://www.plannedparenthood.org/health-center/tennessee/nashville/37203/nashville-health-center-2716-91550" 

您现在可以使用这些单独的链接进一步导航。


推荐阅读