首页 > 解决方案 > Python - web crawling / different result from same code? / requests, bs4 / M1

问题描述

I learning python for web crawling, but i'm totally stuck.

Each time I run this codes, results change.

very rarely, it works but almost return empty list.

why does it happen? please let me know

from indeed import extract_indeed_pages, extract_indeed_jobs


last_indeed_page = extract_indeed_pages()

print(last_indeed_page)

indeed_jobs = extract_indeed_jobs(last_indeed_page)

print(indeed_jobs)

import requests
from bs4 import BeautifulSoup

LIMIT = 50
URL = f"https://kr.indeed.com/jobs?q=React&l=%EC%84%9C%EC%9A%B8&radius=100&jt=fulltime&limit={LIMIT}"


def extract_indeed_pages():
    result = requests.get(URL)
    soup = BeautifulSoup(result.text, "html.parser")
    pagination = soup.find("div", {"class": "pagination"})

    links = pagination.find_all('a')
    pages = []
    for link in links[:-1]:
        pages.append(int(link.string))

    max_page = pages[-1]
    return max_page


def extract_indeed_jobs(last_page):

    jobs = []
    
    result = requests.get(f"{URL}&start={0*LIMIT}")
    soup = BeautifulSoup(result.text, "html.parser")
    results = soup.find_all("h2", {"class": "jobTitle"})
    jobs.append(results)

    return jobs

标签: pythonbeautifulsouppython-requestsweb-crawler

解决方案


发生这种情况是因为源代码上的 javascript。ctrl + u您可以通过按PC 上的按钮查看网页。


推荐阅读