相信我,这是爬的比较好的一个网站之一,里面有重定向的东西,不是很好爬!值的练习!
代码如下:
import requests
import csv,time
url = "https://www.lagou.com/jobs/positionAjax.json?needAddtionalResult=false"
def get_cookie():
cookie = requests.get("https://www.lagou.com/jobs/list_web%E5%89%8D%E7%AB%AF?labelWords=&fromSearch=true&suginput=", headers={
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/77.0.3865.90 Safari/537.36'},
allow_redirects=False).cookies
return cookie
headers = {
"Host": "www.lagou.com",
"Origin": "https://www.lagou.com",
"Referer": "https://www.lagou.com/jobs/list_web%E5%89%8D%E7%AB%AF?labelWords=&fromSearch=true&suginput=",
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36"
}
f = open('拉勾招聘信息.csv',mode="a",encoding="gb18030")
csv_write = csv.writer(f)
csv_write.writerow(['公司名称','城市','职位名称','薪资','经验','公司规模','其他信息'])
for j in range(30):
json = {
"first": "true",
# pn就是用来设置翻页,kd用来设置关键字
"pn": str(j),
"kd": "web前端"
}
response = requests.post(url=url, headers=headers, data=json, cookies=get_cookie())
html = response.json()['content']['positionResult']['result']
# pprint.pprint(html)
time.sleep(3)
for i in range(len(html)):
csv_write.writerow([str(html[i]['companyFullName']),str(html[i]['city']),str(html[i]['positionName']),str(html[i]['salary']),str(html[i]['workYear']),str(html[i]['companySize']),str(html[i]['companyLabelList'])])
print('第'+str(j)+"页打印成功!")
f.close()
Pycharm中运行的效果是:
最后,生成的csv文件中显示的效果是:
是不是比之前的招聘网站信息好看些!