python - 从特定表解析 Python BeautifulSoup
问题描述
这是使用 BeautifulSoup 解析 HTML 的脚本的一部分。我正在尝试从页面获取链接,这些链接稍后会被使用。一切似乎都运行良好,但是我只想获取其中一些链接而不是全部链接,我的意思是我只对页面第一个表格中的链接感兴趣。我确实意识到我可以手动缩短列表,但这并不适合我。
这是该页面的网址:https ://www.spotrac.com/nba/atlanta-hawks/cap/
有什么办法可以做到这一点?
from bs4 import BeautifulSoup
from urllib.request import Request, urlopen
import re
req = Request("https://www.spotrac.com/nba/atlanta-hawks/cap/")
html_page = urlopen(req)
soup = BeautifulSoup(html_page, features="html.parser")
links = []
for link in soup.find_all('a'):
links.append(link.get('href'))
players=[]
i=0
while i<len(links):
if "redirect/player" in links[i]:
players.append(links[i])
i+=1
print(players)
解决方案
您可以在下面调整此代码。
import requests
from bs4 import BeautifulSoup
url = 'https://www.spotrac.com/nba/atlanta-hawks/cap/'
headers = {'Host': 'www.spotrac.com',
'Referer': 'https://www.spotrac.com/nba/atlanta-hawks/cap/',
'User-Agent': 'Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.98 Safari/537.36'}
pageTree = requests.get(url, headers=headers)
soup = BeautifulSoup(pageTree.content, 'lxml')
table = soup.find('tbody')
links = table.find_all('a')
for item in links:
print(str(item.text), str(item['href']))
结果:
Kent Bazemore https://www.spotrac.com/redirect/player/11079/
Miles Plumlee https://www.spotrac.com/redirect/player/10851/
Dewayne Dedmon https://www.spotrac.com/redirect/player/13536/
Trae Young https://www.spotrac.com/redirect/player/26971/
Alex Len https://www.spotrac.com/redirect/player/13318/
Taurean Prince https://www.spotrac.com/redirect/player/20217/
Justin Anderson https://www.spotrac.com/redirect/player/17849/
John Collins https://www.spotrac.com/redirect/player/23614/
Kevin Huerter https://www.spotrac.com/redirect/player/26985/
DeAndre' Bembry https://www.spotrac.com/redirect/player/20226/
Omari Spellman https://www.spotrac.com/redirect/player/26996/
Vince Carter https://www.spotrac.com/redirect/player/2590/
Tyler Dorsey https://www.spotrac.com/redirect/player/23642/
Jaylen Adams https://www.spotrac.com/redirect/player/27343/
Jordan Sibert https://www.spotrac.com/redirect/player/18240/
如果这是您需要的,请将此答案标记为已接受。
推荐阅读
- laravel-5 - 在撰写安装时出现扩展丢失错误
- mysql - 指定 schema.sql 和 data.sql 后,无法使用 Spring Boot 创建 MySQL 表并在其中加载初始数据
- gmail - gmail“或”搜索返回的结果少于其中一个搜索词本身
- python - 具有概率的python列表/字典
- react-native - 从 AsyncStorage 检索数据失败
- python - TypeError: 不支持的操作数类型 -: 'datetime.datetime' 和 'int'
- elasticsearch - GraphQL vs Elasticsearch 我应该使用什么来获得返回许多不同模式的快速搜索性能?
- wordpress - Visual Composer 安装主题后返回错误?
- python - 创建向特定点增加的随机数据
- javascript - 如何在与 li 相同的行上创建删除按钮删除位于同一行的 li