首页 > 解决方案 > 使用 Cloudflare 背后的 API,使用 HTTP Post 和 Requests 库

问题描述

我对 Python 有点陌生,但我是 Mathematica (Wolfram) 语言的专家,所以不是一个新手程序员。我做了很多网站抓取并取得了成功。最近,我成功抓取的一个网站发生了变化,现在我无法使用他们的 API。

一般来说,在使用(抓取)API 时,我的方法是打开 Chrome 并使用检查器查找 XHR 调用。然后将该调用复制为 cURL,然后使用此站点将其转换为 python 请求调用。即使对于 cloudflare 背后的网站,这在过去也很有效。对于 cloudflare 背后的网站,我会手动登录并将 cookie 复制到代码中。在此站点更改其代码之前,这始终有效。

我做了那个过程,我的 python 请求代码如下。我正在尝试抓取https://sportsbet.io/sports网站。这个特别的呼吁是要获得所有的篮球联赛。

import requests

headers = {
    'authority': 'sportsbet.io',
    'accept': '*/*',
    'authorization': '',
    'user-agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.88 Safari/537.36',
    'content-type': 'application/json',
    'origin': 'https://sportsbet.io',
    'sec-fetch-site': 'same-origin',
    'sec-fetch-mode': 'cors',
    'sec-fetch-dest': 'empty',
    'referer': 'https://sportsbet.io/sports/basketball/inplay',
    'accept-language': 'en-US,en;q=0.9'
    }

data = '${"operationName":"SportEventListQuery","variables":{"language":"en","site":"sportsbet","slug":"basketball","timePeriod":"LIVE","leagueTournaments":"LIVE","featuredLeagueTournaments":"LIVE","tournamentEventCount":"LIVE"},"query":"query SportEventListQuery($language: String\\u0021, $slug: String\\u0021, $timePeriod: SportsbetNewGraphqlSportLeagues\\u0021, $leagueTournaments: SportsbetNewGraphqlLeagueTournaments\\u0021, $featuredLeagueTournaments: SportsbetNewGraphqlFeaturedLeagueTournaments\\u0021, $tournamentEventCount: SportsbetNewGraphqlTournamentEventCount\\u0021) {\\\\n sportsbetNewGraphql {\\\\n id\\\\n getSportBySlug(slug: $slug) {\\\\n id\\\\n featuredLeague {\\\\n id\\\\n name(language: $language)\\\\n tournaments(childType: $featuredLeagueTournaments) {\\\\n id\\\\n name(language: $language)\\\\n eventCount(childType: $tournamentEventCount)\\\\n league {\\\\n id\\\\n name(language: $language)\\\\n __typename\\\\n }\\\\n __typename\\\\n }\\\\n __typename\\\\n }\\\\n name(language: $language)\\\\n leagues(childType: $timePeriod) {\\\\n id\\\\n name(language: $language)\\\\n slug\\\\n tournaments(childType: $leagueTournaments) {\\\\n id\\\\n name(language: $language)\\\\n eventCount(childType: $tournamentEventCount)\\\\n __typename\\\\n }\\\\n __typename\\\\n }\\\\n __typename\\\\n }\\\\n __typename\\\\n }\\\\n}\\\\n"}'

response = requests.post('https://sportsbet.io/graphql', headers=headers, data=data)
print(response)

我收到 403 错误。寻找有关如何使用(抓取)此 API 的一些指导。

标签: pythonweb-scrapingcookiespython-requestscloudflare

解决方案


推荐阅读