首页 > 解决方案 > 使用 python 抓取多个网页的网页

问题描述

我正在为“Nederland”中的“Junior UX Designer”网页抓取 Indeed.nl。该搜索词的网站包含 6 个有空缺的网页 - 意思是,如果一个网页包含 15 个空缺,我总共应该得到大约 90 个空缺。但是,当我将它放入 json 文件时,我可以看到我收到了 90 行 - 但是,那里有多个重复项,而且文件中甚至没有显示许多职位空缺。

这是我正在使用的代码:

import requests
from bs4 import BeautifulSoup
import json

jobs_NL = []
for i in range(1,7):
  url = "https://nl.indeed.com/vacatures?q=junior+ux+designer&l=Nederland&start="+str(i)
  
  print("Getting page",i)
  
  page = requests.get(url)

  html = BeautifulSoup(page.content, "html.parser")

  job_title = html.find_all("table", class_="jobCard_mainContent")

  for item in job_title:
      title = item.find("h2").get_text() 
      company = item.find("span", class_="companyName").get_text()
      location = item.find("div", class_="companyLocation").get_text()

      if item.find("div", class_="salary-snippet") != None:
        salary = item.find("div", class_="heading6 tapItem-gutter metadataContainer").get_text()
      else:
        salary = "No salary found"

      vacancy = {
          "title": title,
          "company": company,
          "location": location,
          "salary": salary
          }
      jobs_NL.append(vacancy)

标签: pythonhtmlweb-scraping

解决方案


您需要将start变量乘以10得到正确的页面:

import requests
import pandas as pd
from bs4 import BeautifulSoup

jobs_NL = []
for i in range(7):
    url = "https://nl.indeed.com/vacatures?q=junior+ux+designer&l=Nederland&start={}".format(
        10 * i
    )

    print("Getting page", i)

    page = requests.get(url)
    html = BeautifulSoup(page.content, "html.parser")
    job_title = html.find_all("table", class_="jobCard_mainContent")

    for item in job_title:
        title = item.find("h2").get_text()
        company = item.find("span", class_="companyName").get_text()
        location = item.find("div", class_="companyLocation").get_text()

        if item.find("div", class_="salary-snippet") != None:
            salary = item.find(
                "div", class_="heading6 tapItem-gutter metadataContainer"
            ).get_text()
        else:
            salary = "No salary found"

        vacancy = {
            "title": title,
            "company": company,
            "location": location,
            "salary": salary,
        }
        jobs_NL.append(vacancy)

df = pd.DataFrame(jobs_NL)
print(df)

印刷:

...
90                                           UX Designer | SaaS Platform                                       StarApple                                     Amersfoort  €3.000 - €4.500 per maand
91                                                    Frontend Developer                                      JustBetter                                        Alkmaar            No salary found
92                                                     Software Engineer                              Infinitas Learning                                    Thuiswerken            No salary found
93                                                         UX Researcher                  Cognizant Technology Solutions                                      Amsterdam            No salary found
94                                            Junior Front End developer                                       StarApple                                 Zeist+1 plaats  €2.500 - €3.000 per maand
95                                  nieuwSenior User Experience Designer                                         Trimble                                     Bodegraven            No salary found
96                                  Senior UX Designer - Research Agency                        Found Professionals B.V.                             Amsterdam+1 plaats            No salary found
97                                                HubSpot marketing lead                                          Comaxx                                         Waalre            No salary found
98                                  nieuwJunior Technisch CRO Specialist                                   Finest People                                 Amsterdam West           €50.000 per jaar
99                                                         iOS developer                                       Infoplaza                                         Houten            No salary found

推荐阅读