首页 > 解决方案 > 使用 BeautifulSoup 使用下拉列表刮取表格内容

问题描述

我想从https://www.cbssports.com/nfl/playersearch?POSITION=RB&print_rows=9999为所有位置的所有玩家 抓取所有搜索结果。在此处输入图像描述

我已经使用以下代码获得了所有 RB 玩家:

from bs4 import BeautifulSoup
import requests

html_text = requests.get('https://www.cbssports.com/nfl/playersearch?POSITION=RB&print_rows=9999')
html = html_text.text
soup = BeautifulSoup(html, 'html.parser')
player_table = soup.find('table', class_='data')

for tr in all_player_table.find_all('tr', class_=['row1','row2']):
    tds = tr.find_all('td')
    print(("Player:%s , Position:%s , Team: %s") % (tds[0].text, tds[1].text, tds[2].text))

我现在正面临着从下拉列表中抓取其他位置的玩家。最好的方法是什么?

标签: pythonweb-scrapingbeautifulsoup

解决方案


这个想法很简单:你可以抓取所有位置,修改 URL 并搜索所有玩家。在代码中:

from bs4 import BeautifulSoup
import requests

main_url = "https://www.cbssports.com/nfl/playersearch"
soup = BeautifulSoup(requests.get(main_url).text, "html.parser")

# Scrape all positions
positions = [o["value"] for o in soup.find("select", {'name' : "POSITION"}).find_all("option")]

for position in positions:
    url = f"{main_url}?POSITION={position}&print_rows=9999"
    # Find all players
    soup = BeautifulSoup(requests.get(url).text, "html.parser")
    for tr in soup.find("table", class_="data").find_all("tr", class_=["row1", "row2"]):
        tds = tr.find_all('td')
        print(("Player: %s , Position: %s , Team: %s") % (tds[0].text, tds[1].text, tds[2].text))

推荐阅读