python - 硒只获得第一页导致循环
问题描述
我很难弄清楚为什么我的代码不会刷新 DOM 并获取新结果。我想抓住每一页:
案例标题 日期 pdf 链接 详细信息链接
该脚本抓取第一页结果并单击“下一步按钮”,表格计数器继续增加,但是以下结果来自每次单击下一步按钮后的第一页。
网页在这里
我的相关代码:
url = https://www.govinfo.gov/app/collection/uscourts/district/caed/2021/%7B%22pageSize%22%3A%22500%22%2C%22offset%22%3A%220%22%7D
driver.get(url)
page = driver.page_source
soup = bs(page, "html.parser")
cnt = 0
while True:
tables = soup.find_all('table', class_='table')
# WebDriverWait(driver, 15).until(EC.presence_of_element_located((By.XPATH,'//span[@class="custom-paginator"]')))
for my_table in tables:
cnt += 1
print ('=============== Table {} ==============='.format(cnt))
print('Court: ' + 'United States Court ' + value)
rows = my_table.find_all('td')
for row in rows:
cells = row.find_all('b')
""" getting case title"""
for cell in cells:
span_1 = cell.find('span', {'class':'results-line1'}).text
print('Case: ' + span_1)
"""getting case date"""
next_cells = row.find_all('em')
for next_cell in next_cells:
span_2 = next_cell.find('span', {'class':'results-line2'}).text
print('Date: ' + span_2)
# links = []
links = row.find_all('a', href=True)
"""grabbing the pdf link then the details link only"""
for link in links:
start_link = 'https://www.govinfo.gov'
pdf = (link.get('href'))
pdf_link = re.search("pdf$", pdf)
fixed_link = "".join((start_link,pdf))
if pdf_link:
print('Link: ' + fixed_link)
elif '/details' in pdf:
print('Details: ' + fixed_link)
else:
break
try:
next_page = driver.find_elements_by_class_name('fw-pagination-btn')
if len(next_page) <1:
print("No more pages left")
break
else:
pages = WebDriverWait(driver, 15).until(EC.element_to_be_clickable((By.XPATH, "//span[@class='custom-paginator']//li[@class='next fw-pagination-btn']/a")))
page_count = 0
print('clicking next button')
page_count +=1
print('---------page{}---------'.format(page_count))
pages.click()
time.sleep(7)
except TimeoutException:
break
driver.quit()
我似乎无法弄清楚如何在代码的“尝试”部分中刷新数据。任何帮助表示赞赏。
解决方案
推荐阅读
- c# - ASP.NET 项目中的传递 SDK 项目引用
- c# - 使用 CSVHelper 如何反序列化带有子项列表的 CSV
- javascript - How to compare ids that are inside of plugin?
- android - 为 iOS 和 Android 提供不同的 Progressive Web App UI
- sql - Compare numerically in varchar column
- android - 从 google Play Store 获取应用用户信息
- docker - 将 ansible 应用程序安装到 docker 容器
- javascript - 在填充数组项上的映射函数似乎为空之后
- jquery - vue.js 中的 jquery 方法实现
- r - 计算 R 中特殊列的 RMSE