首页 > 解决方案 > BeautifulSoup 网页表格抓取

问题描述

from urllib2 import urlopen, Request
from bs4 import BeautifulSoup
site = 'https://racing.hkjc.com/racing/information/English/racing/LocalResults.aspx/'
hdr = {'User-Agent': 'Mozilla/5.0'}
req = Request(site, headers=hdr)
res = urlopen(req)
rawpage = res.read()
page = rawpage.replace("<!-->", "")
soup = BeautifulSoup(page, "html.parser")
table = soup.find("table", {"class":"f_tac table_bd draggable"})
print (table)

这项工作完美地得到了一个表格输出,直到我将 url 更改为下一页没有任何输出(无)

' https://racing.hkjc.com/racing/information/English/Racing/LocalResults.aspx?RaceDate=2020/03/14&Racecourse=ST&RaceNo=2 '

请帮助网址或代码有什么问题?

标签: python-2.7screen-scraping

解决方案


您必须将查询字符串添加到 url 的末尾:

示例:从第 2 页获取表:

site ='https://racing.hkjc.com/racing/information/English/racing/LocalResults.aspx/?RaceDate=2020/03/14&Racecourse=ST&RaceNo=2'

推荐阅读