python - 如何使用 Selenium 遍历 URL 以进行抓取
问题描述
以下是我迄今为止从https://n.rivals.com/state_rankings/2021/alabama抓取的代码。我希望代码通过替换阿拉巴马州所在的所有州来遍历地址。理想情况下,我还希望能够更改年份以供将来使用。我如何定义 url 和 year/state1 做错了什么?
from selenium import webdriver
from selenium.common.exceptions import TimeoutException
TIMEOUT = 5
driver = webdriver.Firefox()
driver.set_page_load_timeout(TIMEOUT)
url = f"https://n.rivals.com/state_rankings/{year}/{state1}"
year = "2021"
state1 = "alabama"
try:
driver.get(url)
except TimeoutException:
pass
first_names = driver.find_elements_by_class_name('first-name')
first_names = [name.text for name in first_names]
last_names = driver.find_elements_by_class_name('last-name')
last_names = [name.text for name in last_names]
for first, last in zip(first_names, last_names):
print(first, last)
player_positions = driver.find_elements_by_class_name('pos')
player_positions = [position.text for position in player_positions]
for position in player_positions:
print(position)
data = driver.find_elements_by_xpath('//div[@class="break-text ng-binding ng-scope"]')
for d in data:
location, highschool = d.text.strip().split('\n')
city, state = location.split(',')
print(city)
print(state)
print(highschool)
commit_status = driver.find_elements_by_class_name('school-name')
commit_status = [commit.text for commit in commit_status]
for commit in commit_status:
print(commit)
driver.close()
解决方案
您必须在引用它们之前创建变量,如下所示:
year = "2021"
state1 = "alabama"
url = f"https://n.rivals.com/state_rankings/{year}/{state1}"
然后要使其循环许多状态,您将执行以下操作:
from selenium import webdriver
from selenium.common.exceptions import TimeoutException
TIMEOUT = 5
driver = webdriver.Firefox()
driver.set_page_load_timeout(TIMEOUT)
def rivals_scrape(state, year):
url = f"https://n.rivals.com/state_rankings/{year}/{state}"
try:
driver.get(url)
except TimeoutException:
pass
first_names = driver.find_elements_by_class_name('first-name')
... rest of code ...
for commit in commit_status:
print(commit)
states = ["alabama", "georgia","texas"]
for state in states:
rivals_scrape(state, "2021")
driver.close()
推荐阅读
- flutter - 根据 ListView 的大小将 ListView 放入/取出 Expanded
- flutter - 如何在构造函数体内调用 super ?
- c# - 如何在 C# 中创建和返回匿名 JSON 对象
- linux - 制作 PIE 对象时不能使用 .data'
- android - Android APP Bundles apk 无法通过 bundletool 安装(INSTALL_PARSE_FAILED_NO_CERTIFICATES)
- arrays - 将多列堆叠为一列而不忽略空白单元格
- swift - 未触发 Combine 中的 CombineLatest
- ruby-on-rails - 仅遍历持久对象
- css - SVG 转换滞后于 firefox
- git - 为什么我已经添加了 git remote add 后需要在本地分支上设置上游?