首页 > 技术文章 > Python爬虫案例Demo——拉勾招聘信息的爬取

lures 2020-01-14 23:15 原文

相信我,这是爬的比较好的一个网站之一,里面有重定向的东西,不是很好爬!值的练习!
代码如下:

import requests
import csv,time

url = "https://www.lagou.com/jobs/positionAjax.json?needAddtionalResult=false"

def get_cookie():
    cookie = requests.get("https://www.lagou.com/jobs/list_web%E5%89%8D%E7%AB%AF?labelWords=&fromSearch=true&suginput=", headers={
        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/77.0.3865.90 Safari/537.36'},
                          allow_redirects=False).cookies
    return cookie

headers = {
    "Host": "www.lagou.com",
    "Origin": "https://www.lagou.com",
    "Referer": "https://www.lagou.com/jobs/list_web%E5%89%8D%E7%AB%AF?labelWords=&fromSearch=true&suginput=",
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36"
}

f = open('拉勾招聘信息.csv',mode="a",encoding="gb18030")
csv_write = csv.writer(f)
csv_write.writerow(['公司名称','城市','职位名称','薪资','经验','公司规模','其他信息'])

for j in range(30):
    json = {
        "first": "true",
        # pn就是用来设置翻页,kd用来设置关键字
        "pn": str(j),
        "kd": "web前端"
    }
    response = requests.post(url=url, headers=headers, data=json, cookies=get_cookie())
    html = response.json()['content']['positionResult']['result']
    # pprint.pprint(html)
    time.sleep(3)
    for i in range(len(html)):
        csv_write.writerow([str(html[i]['companyFullName']),str(html[i]['city']),str(html[i]['positionName']),str(html[i]['salary']),str(html[i]['workYear']),str(html[i]['companySize']),str(html[i]['companyLabelList'])])
    print('第'+str(j)+"页打印成功!")
f.close()

Pycharm中运行的效果是:
在这里插入图片描述
最后,生成的csv文件中显示的效果是:
在这里插入图片描述
是不是比之前的招聘网站信息好看些!

推荐阅读