python - 使用 python 抓取多个网页的网页
问题描述
我正在为“Nederland”中的“Junior UX Designer”网页抓取 Indeed.nl。该搜索词的网站包含 6 个有空缺的网页 - 意思是,如果一个网页包含 15 个空缺,我总共应该得到大约 90 个空缺。但是,当我将它放入 json 文件时,我可以看到我收到了 90 行 - 但是,那里有多个重复项,而且文件中甚至没有显示许多职位空缺。
这是我正在使用的代码:
import requests
from bs4 import BeautifulSoup
import json
jobs_NL = []
for i in range(1,7):
url = "https://nl.indeed.com/vacatures?q=junior+ux+designer&l=Nederland&start="+str(i)
print("Getting page",i)
page = requests.get(url)
html = BeautifulSoup(page.content, "html.parser")
job_title = html.find_all("table", class_="jobCard_mainContent")
for item in job_title:
title = item.find("h2").get_text()
company = item.find("span", class_="companyName").get_text()
location = item.find("div", class_="companyLocation").get_text()
if item.find("div", class_="salary-snippet") != None:
salary = item.find("div", class_="heading6 tapItem-gutter metadataContainer").get_text()
else:
salary = "No salary found"
vacancy = {
"title": title,
"company": company,
"location": location,
"salary": salary
}
jobs_NL.append(vacancy)
解决方案
您需要将start
变量乘以10
得到正确的页面:
import requests
import pandas as pd
from bs4 import BeautifulSoup
jobs_NL = []
for i in range(7):
url = "https://nl.indeed.com/vacatures?q=junior+ux+designer&l=Nederland&start={}".format(
10 * i
)
print("Getting page", i)
page = requests.get(url)
html = BeautifulSoup(page.content, "html.parser")
job_title = html.find_all("table", class_="jobCard_mainContent")
for item in job_title:
title = item.find("h2").get_text()
company = item.find("span", class_="companyName").get_text()
location = item.find("div", class_="companyLocation").get_text()
if item.find("div", class_="salary-snippet") != None:
salary = item.find(
"div", class_="heading6 tapItem-gutter metadataContainer"
).get_text()
else:
salary = "No salary found"
vacancy = {
"title": title,
"company": company,
"location": location,
"salary": salary,
}
jobs_NL.append(vacancy)
df = pd.DataFrame(jobs_NL)
print(df)
印刷:
...
90 UX Designer | SaaS Platform StarApple Amersfoort €3.000 - €4.500 per maand
91 Frontend Developer JustBetter Alkmaar No salary found
92 Software Engineer Infinitas Learning Thuiswerken No salary found
93 UX Researcher Cognizant Technology Solutions Amsterdam No salary found
94 Junior Front End developer StarApple Zeist+1 plaats €2.500 - €3.000 per maand
95 nieuwSenior User Experience Designer Trimble Bodegraven No salary found
96 Senior UX Designer - Research Agency Found Professionals B.V. Amsterdam+1 plaats No salary found
97 HubSpot marketing lead Comaxx Waalre No salary found
98 nieuwJunior Technisch CRO Specialist Finest People Amsterdam West €50.000 per jaar
99 iOS developer Infoplaza Houten No salary found
推荐阅读
- python - 将 df 列的值与单个值进行比较
- flutter-bloc - 子块调用其祖先 mapEventToState
- kotlin - 错误:类型不匹配:推断类型是字符串?但布尔值是预期的
- mysql - MYSQL查询有多少人拥有物品
- python - 使用 Sobel 边缘检测方法时出错
- java - 我尝试运行 gradlew genSource
- asp.net-core - 使用 xUnit 进行集成测试
- php - MySQL 错误 1305 - FUNCTION dbname.STUFF 不存在
- string - 如何确定变量在 perl 中是存储为数字还是字符串?
- esp8266 - A9G 上通过 AT 命令的 HTTPS 请求在 7 次请求后失败;HTTP 工作正常