python - Python - web crawling / different result from same code? / requests, bs4 / M1
问题描述
I learning python for web crawling, but i'm totally stuck.
Each time I run this codes, results change.
very rarely, it works but almost return empty list.
why does it happen? please let me know
from indeed import extract_indeed_pages, extract_indeed_jobs
last_indeed_page = extract_indeed_pages()
print(last_indeed_page)
indeed_jobs = extract_indeed_jobs(last_indeed_page)
print(indeed_jobs)
import requests
from bs4 import BeautifulSoup
LIMIT = 50
URL = f"https://kr.indeed.com/jobs?q=React&l=%EC%84%9C%EC%9A%B8&radius=100&jt=fulltime&limit={LIMIT}"
def extract_indeed_pages():
result = requests.get(URL)
soup = BeautifulSoup(result.text, "html.parser")
pagination = soup.find("div", {"class": "pagination"})
links = pagination.find_all('a')
pages = []
for link in links[:-1]:
pages.append(int(link.string))
max_page = pages[-1]
return max_page
def extract_indeed_jobs(last_page):
jobs = []
result = requests.get(f"{URL}&start={0*LIMIT}")
soup = BeautifulSoup(result.text, "html.parser")
results = soup.find_all("h2", {"class": "jobTitle"})
jobs.append(results)
return jobs
解决方案
发生这种情况是因为源代码上的 javascript。ctrl + u
您可以通过按PC 上的按钮查看网页。
推荐阅读
- hyperledger-fabric - 在 Ubuntu 上安装 Fabric
- java - 在 ArrayList 中访问 ArrayList
- python - Pyqt5 QTableWidget 自定义键盘功能
- python - tensorflow 随机值在单次运行中是否保证相同?
- python - 如何在 SymPy 中使用 plot_implicit 绘制横截面
- fortran - 如何从 Fortran 函数访问全局变量
- android - 用户更新应用程序时如何更新共享偏好数据?
- python - 从句子 NLP 中提取意义的方法
- python - 如何使用 Gitlab CI 单独部署 docker 微服务并允许它们对话?
- vba - VBA 中的速记