首页 > 解决方案 > 如何使用 Selenium 遍历 URL 以进行抓取

问题描述

以下是我迄今为止从https://n.rivals.com/state_rankings/2021/alabama抓取的代码。我希望代码通过替换阿拉巴马州所在的所有州来遍历地址。理想情况下,我还希望能够更改年份以供将来使用。我如何定义 url 和 year/state1 做错了什么?

from selenium import webdriver
from selenium.common.exceptions import TimeoutException

TIMEOUT = 5

driver = webdriver.Firefox()
driver.set_page_load_timeout(TIMEOUT)

url = f"https://n.rivals.com/state_rankings/{year}/{state1}"
year = "2021"
state1 = "alabama"

try:
    driver.get(url)
except TimeoutException:
    pass

first_names = driver.find_elements_by_class_name('first-name')
first_names = [name.text for name in first_names]

last_names = driver.find_elements_by_class_name('last-name')
last_names = [name.text for name in last_names]

for first, last in zip(first_names, last_names):
    print(first, last)

player_positions = driver.find_elements_by_class_name('pos')
player_positions = [position.text for position in player_positions]

for position in player_positions:
    print(position)

data = driver.find_elements_by_xpath('//div[@class="break-text ng-binding ng-scope"]')
for d in data:
    location, highschool = d.text.strip().split('\n')
    city, state = location.split(',')
    print(city)
    print(state)
    print(highschool)

commit_status = driver.find_elements_by_class_name('school-name')
commit_status = [commit.text for commit in commit_status]

for commit in commit_status:
    print(commit)

driver.close()

标签: pythonseleniumfor-loopweb-scraping

解决方案


您必须在引用它们之前创建变量,如下所示:

year = "2021"
state1 = "alabama"
url = f"https://n.rivals.com/state_rankings/{year}/{state1}"

然后要使其循环许多状态,您将执行以下操作:

from selenium import webdriver
from selenium.common.exceptions import TimeoutException

TIMEOUT = 5

driver = webdriver.Firefox()
driver.set_page_load_timeout(TIMEOUT)

def rivals_scrape(state, year):
    url = f"https://n.rivals.com/state_rankings/{year}/{state}"

    try:
        driver.get(url)
    except TimeoutException:
        pass

    first_names = driver.find_elements_by_class_name('first-name')

    ... rest of code ...

    for commit in commit_status:
        print(commit)

states = ["alabama", "georgia","texas"]

for state in states:
    rivals_scrape(state, "2021")

driver.close()

推荐阅读