首页 > 解决方案 > 您如何在 Rstudio 上从同一网站上抓取多个页面

问题描述

所以我想使用 RStudio 从同一网站的多个页面下载数据 https://www.irishjobs.ie/ShowResults.aspx?Keywords=Data&autosuggestEndpoint=%2fautosuggest&Location=0&Category=&Recruiter=Company&btnSubmit=Search&Page=2

第 2 页和第 3 页之间的区别是……在超链接的末尾,我们只有 3 而不是 2 我可以从 1 页的 25 个工作中获得所需的内容,但我想从 4 中获得 100 个工作页。我正在使用选择器小工具 chrome 扩展。

我尝试了 for 循环

for (page_result in seq(from =1, to = 101, by = 25)) {
link = paste0(“ https://www.irishjobs.ie/ShowResults.aspx?Keywords=Data&autosuggestEndpoint=%2fautosuggest&Location=0&Category=&Recruiter=Company&btnSubmit=Search&Page=2)
page = read_html(link)

我不知道该怎么做

我想我需要将 page_result 放入链接中,但我不知道在哪里。我欢迎任何想法。我有 rvest 包和 dplyr 包。但我希望 for 循环遍历每一页。任何想法如何最好地做到这一点,谢谢

标签: rrvest

解决方案


在此处输入图像描述

4个链接可以很容易地放入for循环。从 DOM 复制 CSS 链接并迭代 5 到 30 次以获得所有 25 个作业。

AllJOBS <- vector()
for (i in 1:4) {
  print("s")
  url <- paste0("https://www.irishjobs.ie/ShowResults.aspx?Keywords=Data&autosuggestEndpoint=%2fautosuggest&Location=0&Category=&Recruiter=Company&btnSubmit=Search&Page=",i,sep="")
  for (k in 5:30) {
  jobs <- read_html(url) %>% html_node(css = paste0("#page > div.container > div.column-wrap.order-one-two > div.two-thirds > div:nth-child(",k,") > div > div.job-result-logo-title > div.job-result-title > h2 > a")) %>% html_text()
  AllJOBS <- append(AllJOBS,jobs)
  Sys.sleep(runif(1,1,2))
  print(k)
  } 
  print(paste0("Page",i))
}
  

输出

> AllJOBS
 [1] "Senior Consultant - Fund Static Data"                                                                
 [2] "Data Warehouse Engineer"                                                                             
 [3] "Senior Software Engineer - Big Data DevOps"                                                          
 [4] "HR Data Analyst"                                                                                     
 [5] "Data Insights Engineer - Dublin - Permanent/Contract - SQL Server"                                   
 [6] NA                                                                                                    
 [7] "Data Engineer - Master Data Services - SQL Server - Permanent/Contract"                              
 [8] "Senior Data Protection Officer (DPO) - Contract"                                                     
 [9] "QC Data Analyst (Trending)"                                                                          
[10] "Senior Data Warehouse Developer"                                                                     
[11] "Senior Data Analyst FTC"                                                                             
[12] "Compliance Advisory and Data Protection Relationship Manager"                                        
[13] "Contracts Manager-Data Center"                                                                       
[14] "Payments Product Data Analyst"                                                                       
[15] "Data Center Product Hardware Platform Engineer"                                                      
[16] "People Data Privacy Program Lead"                                                                    
[17] "Head of Data Science"                                                                                
[18] "Data Protection Counsel (Product or Compliance)"                                                     
[19] "Data Engineer, GMS"                                                                                  
[20] "Data Protection Associate General Counsel"                                                           
[21] "Senior Data Engineer"                                                                                
[22] "Geospatial Data Scientist"                                                                           
[23] "Data Solutions Manager"                                                                              
[24] "Data Protection Solicitor"                                                                           
[25] "Junior Data Scientist"                                                                               
[26] "Master Data Specialist"                                                                              
[27] "Temp QC Electronic Data Management Analyst"                                                          
[28] "20725 -Data Scientist - Limerick"                                                                    
[29] "Technical Support Specialist - Data Centre"                                                          
[30] "Lead QC Micro Analyst (data review and compliance)"                                                  
[31] "Temp QC Data  Analyst"                                                                               
[32] "#Abbvie Compliance Engineer (Data Integrity)"                                                        
[33] "People Data Analyst"                                                                                 
[34] "Senior Electrical Design Engineer - Data Centre Ex"                                                  
[35] "Laboratory Data Entry Assistant, UCD NVRL"                                                           
[36] "Data Migrations Specialist"                                                                          
[37] "Data Protection Officer"                                                                             
[38] "Data Center Operations Engineer (Linux)"                                                             
[39] "Senior Electrical Engineer | Data Centre LV Design"                                                  
[40] "Data Scientist - (Process Sciences)"                                                                 
[41] "Mgr Supply Logistics Global Materials Data"                                                          
[42] "Data Protection / Privacy Delivery Consultant"                                                       
[43] "Global Supply Chain Data Analyst"                                                                    
[44] "QC Data Analyst"                                                                                     
[45] "0582GradeVIIFOIOLOL1120 - Grade VII Data Protection / Freedom of Information & Compliance Officer"   
[46] "DPO001 - Deputy Data Protection Officer (General Manager) Office of the Head of Data Protection, HSE"
[47] "Senior Campaign Data Analyst"                                                                        
[48] "Data & Reporting Analyst II"                                                                         
[49] "Azure Data Analytics Solution Architect"                                                             
[50] "Head of Risk Assurance for IT, Data, Projects and Outsourcing"                                       
[51] "Trainee Data Technician, Ireland"                                                                    
[52] NA 

您可以单独处理 NA。这能回答你的问题还是我误解了它?


推荐阅读