首页 > 解决方案 > 在 python 中对 Biwenger 的 Html 请求

问题描述

我正在尝试使用 html 请求从 Biwenger 抓取数据,但响应返回的数据与在 chrome 中打开 url 时返回的数据不同。

这是我的代码

import requests

shots_url = "https://biwenger.as.com/user/naranjas-4537694"

response = requests.get(shots_url)
response.raise_for_status() # raise exception if invalid response

print(response.text)

我没有收到任何错误()但是数据请求显示的数据与 url 中的数据和此消息不同:

<!doctype html><meta charset=utf-8><title>Biwenger</title><base href=/ ><meta...<div class=body><p>Looks like the browser you're using is not compatible with Biwenger :(<p>We recommend using <a href=http://www.google.com/chrome/ target=_blank>Google Chrome</a>...</script>

知道我可以使用什么代码来获取正确的数据吗?

如果您需要更多信息,请告诉我。谢谢大家。

标签: pythonhtmlweb-scrapingrequest

解决方案


数据通过 JavaScript/JSON 动态加载。当您打开 Firefox/Chrome 开发人员工具 - 网络选项卡时,您将看到页面发出请求的位置)。

此示例将获取有关用户玩家的信息:

import re
import json
import requests
from pprint import pprint
from bs4 import BeautifulSoup


user_data_url = 'https://biwenger.as.com/api/v2/user/4537694?fields=*,account(id),players(id,owner),lineups(round,points,count,position),league(id,name,competition,mode,scoreID),market,seasons,offers,lastPositions'
all_data_url = 'https://cf.biwenger.com/api/v2/competitions/la-liga/data?lang=en&score=1&callback=jsonp_xxx' # <--- check @αԋɱҽԃ αмєяιcαη answer, it's possible to do it without callback= parameter

response = requests.get(all_data_url)
data = json.loads( re.findall(r'jsonp_xxx\((.*)\)', response.text)[0] )

user_data = requests.get(user_data_url).json()

# pprint(user_data)  # <-- uncomment this to see user data
# pprint(data)       # <-- uncomment this to see data about all players

for p in user_data['data']['players']:
    pprint(data['data']['players'][str(p['id'])])
    print('-' * 80)

印刷:

    {'fantasyPrice': 22000000,
     'fitness': [10, 2, 2, 2, -2],
     'id': 599,
     'name': 'Pedro León',
     'playedAway': 8,
     'playedHome': 8,
     'points': 38,
     'pointsAway': 16,
     'pointsHome': 22,
     'pointsLastSeason': 16,
     'position': 3,
     'price': 1400000,
     'priceIncrement': 60000,
     'slug': 'pedro-leon',
     'status': 'ok',
     'teamID': 76}
    --------------------------------------------------------------------------------
    {'fantasyPrice': 9000000,
     'fitness': [None, 'injured', 'doubt', None, 2],
     'id': 1093,
     'name': 'Javi López',
     'playedAway': 4,
     'playedHome': 2,
     'points': 10,
     'pointsAway': 6,
     'pointsHome': 4,
     'pointsLastSeason': 77,
     'position': 2,
     'price': 210000,
     'priceIncrement': 0,
     'slug': 'javier-lopez',
     'status': 'ok',
     'teamID': 7}
    --------------------------------------------------------------------------------

... and so on.

推荐阅读