首页 > 解决方案 > 无法从 https://www.gbgb.org.uk/meeting/?meetingId=355352&raceId=576255 抓取数据

问题描述

嗨,由于某种原因,我似乎无法使用 BS 从https://www.gbgb.org.uk/抓取任何结果数据,我可以使用 prettify 打印我想要的结果页面页面,但只要我要求“find_all”例如,我得到一个 0 返回,任何人都可以看看我是否做错了什么,因为相同的代码在其他网站上运行良好,下面是我的意思的一个快速示例,非常感谢

import urllib.request
import urllib.parse
from requests import get
url = 'https://www.gbgb.org.uk/meeting/?meetingId=355490&raceId=577749'
response = get(url)
#print(response.text[:500])

headers = {}
headers['User-Agent'] ="Mozilla/5.0 (X11; Linux i686) AppleWebKit/537.17 (KHTML, like Gecko) Chrome/24.0.1312.27 Safari/537.17"
req = urllib.request.Request(url, headers = headers)


from bs4 import BeautifulSoup
html_soup = BeautifulSoup(response.text, 'html.parser')
type(html_soup)
#print(html_soup.prettify())

info_container = html_soup.find_all('div', class_ = 'MeetingRaceTrap')
print(type(info_container))
print(len(info_container))

标签: python-3.xweb-scrapingbeautifulsoup

解决方案


如果你去NetWorkTab.You 将得到以下 API,它以 json 格式返回结果。

https://api.gbgb.org.uk/api/results/meeting/355490?meeting=355490

你在这里不需要 BeautifulSoup。

import requests
import json
headers = {'User-Agent':
       'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.106 Safari/537.36'}
url = 'https://api.gbgb.org.uk/api/results/meeting/355490?meeting=355490'
response =requests.get(url,headers=headers)
data=json.loads(response.text)
print(data)

现在让我们说你想要得到races Just print

print(data[0]['races'])

或者你想获得比赛的奖品。

for price in data[0]['races']:
    print(price['racePrizes'])

你的输出将是

1st £95 | Others £40 | Race Total £95
1st £95 | Others £40 | Race Total £295
1st £105 | Others £40 | Race Total £305
1st £100 | Others £40 | Race Total £300
1st £120 | Others £40 | Race Total £320
1st £110 | Others £40 | Race Total £310
1st £110 | Others £40 | Race Total £310
1st £115 | Others £40 | Race Total £315
1st £120 | Others £40 | Race Total £320
1st £105 | Others £40 | Race Total £305

要获取所有狗的名字,您需要迭代父元素。

import requests
import json
headers = {'User-Agent':
       'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.106 Safari/537.36'}
url = 'https://api.gbgb.org.uk/api/results/meeting/355490?meeting=355490'
response =requests.get(url,headers=headers)
data=json.loads(response.text)
for d in data[0]['races']:
    for dog in d['traps']:
        print(dog['dogName'])

这将打印所有 60 个名称。

Talking Lulu
Demolition Dolly
Holycross Jo Jo
Fieldview Gramps
Fieldview Darcie
Blackrose Frog
Kilbreedy Gaga
Yorkstreet Milly
Blackrose Angus
Greencroft Snowy
Marcos Veggera
Ramors Flash
Dan The Tail
Killinan Fairy
Knockalton Bella
Howl At The Moon
Westmead Boss
Rockhill Romeo
Fieldview Gem
Only One Ding
Fieldview Jet
Leazes Samuel
Glassmoss Sally
Fieldview Franky
Talamh Dochais
Greencroft Spot
Greencroft Jed
Footfield Bee
Hather Pixie
Makeit My Dog
Makeit Mos Bro
Droopys Cristina
Puckane Panda
Hollywood Coco
Fieldview Dolly
Ballyphilip Bill
Bees Charm
Crossfield Hal
Savana Jody
Savana Hottie
Greencroft Briny
Savana Dan Dan
Savana Diamond
Savana Schnappes
Savana Pegasus
Millroad Captian
Savana Pimms
Ballyhoe Vouga
Fieldview Myles
Hollander
Savana Tequila
Ballygibba Chip
Rockburst Tess
All About Will
Clockwork Girl
Roma Lady
Fieldview Pancho
Harry Boy
Rahyvira Lady
Cobblers Girl

推荐阅读