首页 > 解决方案 > 如何在python中使用beautifulsoup提取href内容

问题描述

import requests
from bs4 import BeautifulSoup

page = requests.get('http://espn.go.com/nba/team/roster/_/name/atl/atlanta-hawks')
soup = BeautifulSoup(page.content, "html.parser")
player_list = soup.find_all(class_="Image__Wrapper")
#player_list = soup.find_all("tr")
print(player_list[1])

我得到的输出是

<div class="Image__Wrapper aspect-ratio--child"><img alt="https://a.espncdn.com/i/headshots/nba/players/full/3062667.png" class="" data-mptype="image" src="data:image/gif;base64,R0lGODlhAQABAIAAAAAAAP///yH5BAEAAAAALAAAAAABAAEAAAIBRAA7" title="DeAndre' Bembry"/></div>

我只对获得 DeAndre' Bembry 感兴趣,我该如何提取它。我也有点困惑如何获取所有玩家姓名的列表。

标签: pythonlistbeautifulsouphref

解决方案


你可以试试

import requests
from bs4 import BeautifulSoup

page = requests.get('http://espn.go.com/nba/team/roster/_/name/atl/atlanta-hawks')
soup = BeautifulSoup(page.content, "html.parser")
player_list = soup.find_all(class_="Image__Wrapper")
#player_list = soup.find_all("tr")
print(player_list[1].img["title"])

输出

 DeAndre' Bembry

并打印所有玩家

print([i.img["title"] for i in player_list if 0 < i.img["title"].count(" ") <= 3])

输出

["DeAndre' Bembry", 'Charlie Brown Jr.', 'Clint Capela', 'Vince Carter', 'John Collins', 'Dewayne Dedmon', 'Bruno Fernando', 'Brandon Goodwin', 'Treveon Graham', 'Kevin Huerter', "De'Andre Hunter", 'Damian Jones', 'Skal Labissiere', 'Cam Reddish', 'Jeff Teague', 'Trae Young']

推荐阅读