python - use python request to load more items
问题描述
Im a beginner python programmer.I want to crawl all sport news in skySports website. I created a python request to load more items. I used inspect element in chrome to see the XHR details.
My python Code is shown as below :
import requests
import json
session = requests.Session()
session.trust_env = False
url = 'https://zagent891.h-cdn.com/cmd/get_links_info?customer=sky_uk&zone=gen&ver=1.113.763&url=https%3A%2F%2Fwww.skysports.com%2Fnews-wire'
headers = {
'Origin': 'https://www.skysports.com',
'Referer': 'https://www.skysports.com/news-wire',
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.110 Safari/537.36'
}
params={
'customer': 'sky_uk',
'zone': 'gen',
'ver': '1.113.763',
'url': 'https://www.skysports.com/news-wire'
}
response = session.get(url,headers=headers,params=params)
print(response.json())
When I run this code I get this Error :
{'error': 'wrong zone'}
how can I send this request to load more items and then crawl the news.
解决方案
您可以使用Python 中的BeautifulSoup模块来抓取网页。它专为网页抓取而设计。在这里您可以找到示例代码。https://github.com/Hemil96/Brainyquote-API/blob/master/scrap.py
推荐阅读
- database - 使用 Redis 哈希与许多键的性能比较
- angular - Angular 6/7/8 嵌套子路由使用'loadChildren()',不显示
- reactjs - 反应推送历史不呈现
- javascript - React Native:单击按钮添加/删除输入字段
- c# - unity Rigidbody 向一个方向移动
- pointers - 如何将字符串解析为int
- multithreading - Talend 在具有 32 个线程的雪花中的单个表上执行并行合并语句,进程失败
- domain-driven-design - 这两个选项中的哪一个最适合在域中实现端口
- python - 在 mac book pro 上安装 matplotlib
- reactjs - React 中下拉列表的无效架构内容错误