python - 无法使用请求模块从网页中抓取表格内容
问题描述
我正在尝试使用 requests 模块从网页中抓取表格内容。该页面的内容是高度动态的但是,可以根据开发工具通过 api 访问它。我正在尝试使用适当的参数来模仿相同的发布请求,但我总是得到 status 403
。
import requests
from pprint import pprint
start_url = 'https://opensea.io/rankings'
link = 'https://api.opensea.io/graphql/'
payload = {"id":"rankingsQuery","query":"query rankingsQuery(\n $chain: [ChainScalar!]\n $count: Int!\n $cursor: String\n $sortBy: CollectionSort\n $parents: [CollectionSlug!]\n $createdAfter: DateTime\n) {\n ...rankings_collections\n}\n\nfragment rankings_collections on Query {\n collections(after: $cursor, chains: $chain, first: $count, sortBy: $sortBy, parents: $parents, createdAfter: $createdAfter, sortAscending: false, includeHidden: true, excludeZeroVolume: true) {\n edges {\n node {\n createdDate\n name\n slug\n logo\n stats {\n floorPrice\n marketCap\n numOwners\n totalSupply\n sevenDayChange\n sevenDayVolume\n oneDayChange\n oneDayVolume\n thirtyDayChange\n thirtyDayVolume\n totalVolume\n id\n }\n id\n __typename\n }\n cursor\n }\n pageInfo {\n endCursor\n hasNextPage\n }\n }\n}\n","variables":{"chain":None,"count":100,"cursor":"YXJyYXljb25uZWN0aW9uOjk5","sortBy":"SEVEN_DAY_VOLUME","parents":None,"createdAfter":None}}
with requests.Session() as s:
s.headers['User-Agent'] = 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.131 Safari/537.36'
s.headers['x-api-key'] = '2f6f419a083c46de9d83ce3dbe7db601'
s.headers['x-build-id'] = 'cplNDIqD8Uy8MvANX90r9'
s.headers['referer'] = 'https://opensea.io/'
res = s.post(link,json=payload)
pprint(res.status_code)
print(res.json())
如何使用请求模块从该网页中抓取表格内容?
解决方案
您可以从脚本标签中对其进行正则表达式,然后重建表。有一些列格式要做。
import requests, re, json
import pandas as pd
r = requests.get('https://opensea.io/rankings')
data = json.loads(re.search(r'window\.__wired__=([^<]*)', r.text).group(1))
items = [v for v in data['records'].values() if v['__typename'] in ['CollectionType', 'CollectionStatsType']]
d = {i['name']:j for i, j in zip(items[::2], items[1::2])}
df = pd.DataFrame.from_dict(d, orient='index')
print(df)
正则表达式:
推荐阅读
- c - 在 ANSI C 中,如何制作计时器?
- tensorflow - 我无法训练 FastMaskRCNN
- php - php编程出错
- javascript - node.js fs 删除一些文本
- javascript - 如何在jquery中触发滑块图片更改的事件
- sql - 使用类似 in () 的 sql 查询德鲁伊时出错
- javascript - 为什么 Cordova Android 应用程序会抛出此警告:“[Intervention] Ignored attempt to cancel a touchstart event...”?
- java - 如何通过poi为word中的不同部分设置页码
- javascript - js 正则表达式未按预期工作。未检测到换行符
- java - 带有意外符号的 log4j2 异常输出