python-3.x - 使用 selenium 访问下一页
问题描述
首先,直到昨天我才使用硒。经过多次尝试,我能够正确地抓取目标表。
我目前正在尝试在连续页面上抓取表格。它有时会起作用,有时会立即失败。我花了几个小时在 Google 和 Stack Overflow 上冲浪,但我没有解决我的问题。我确信答案很简单,但 8 小时后我需要向 selenium 专家提问。
我的目标网址是:RedHat Security Advisories
如果 Stack Overflow 上有一个问题可以回答我的问题,请告诉我,我会做一些研究和测试。
以下是我尝试过的一些项目:
示例 1:
page_number = 0
while True:
try:
page_number += 1
browser.execute_script("return arguments[0].scrollIntoView(true);",
WebDriverWait(browser, 30).until(EC.element_to_be_clickable((By.XPATH, '//*[@id="jumpPoint"]/div[3]/div/div/div[2]/div/div['
'2]/dir-pagination-controls/ul/li[str(page_number))]'))))
browser.find_element_by_xpath('//*[@id="jumpPoint"]/div[3]/div/div/div[2]/div/div[2]/dir-pagination-controls/ul/li[str(page_number)').click()
print(f"Navigating to page {page_number}")
# I added this because my connection was
# being terminated by RedHat
time.sleep(20)
except (TimeoutException, WebDriverException) as e:
print("Last page reached")
break
except Exception as e:
print (e)
break
示例 2:
page_number = 0
while True:
try:
page_number += 1
browser.execute_script("return arguments[0].scrollIntoView(true);",
WebDriverWait(browser, 30).until(EC.element_to_be_clickable((By.XPATH, '//*[@id="jumpPoint"]/div[3]/div/div/div[2]/div/div['
'2]/dir-pagination-controls/ul/li[12]'))))
browser.find_element_by_xpath('//*[@id="jumpPoint"]/div[3]/div/div/div[2]/div/div[2]/dir-pagination-controls/ul/li[12]').click()
print(f"Navigating to page {page_number}")
# I added this because my connection was
# being terminated by RedHat
time.sleep(20)
except (TimeoutException, WebDriverException) as e:
print("Last page reached")
break
except Exception as e:
print (e)
break
解决方案
您可以使用以下逻辑。
lastPage = WebDriverWait(driver,120).until(EC.element_to_be_clickable((By.XPATH,"(//ul[starts-with(@class,'pagination hidden-xs ng-scope')]/li[starts-with(@ng-repeat,'pageNumber')])[last()]")))
driver.find_element_by_css_selector("i.web-icon-plus").click()
pages = lastPage.text
pages = '5'
for pNumber in range(1,int(pages)):
currentPage = WebDriverWait(driver,30).until(EC.element_to_be_clickable((By.XPATH,"//ul[starts-with(@class,'pagination hidden-xs ng-scope')]//a[.='" + str(pNumber) + "']")))
print ("===============================================")
print("Current Page : " + currentPage.text)
currentPage.location_once_scrolled_into_view
currentPage.click()
WebDriverWait(driver,120).until_not(EC.element_to_be_clickable((By.CSS_SELECTOR,"#loading")))
# print rows data here
rows = driver.find_elements_by_xpath("//table[starts-with(@class,'cve-table')]/tbody/tr") #<== getting rows here
for row in rows:
print (row.text) <== I am printing all row data, if you want cell data please update the logic accordingly
time.sleep(randint(1, 5)) #<== this step is optional
推荐阅读
- antlr4 - 带有条件缺失分隔符的 Antlr4 表达式
- sas - 如何转置一列中有多个项目响应的数据?在 SAS
- java - Java中用于屏幕颜色分析的类
- laravel - 如何使用 laravel-6 从标题创建 slug URL?
- c# - 在 Unity 中隐藏其他场景中的游戏对象
- python - 切换默认浏览器后 Jupyter Notebook 500 内部服务器错误
- reactjs - React Typescript 如何设置状态
- python-3.x - 在 Python 中读取分号 (';') 分隔的原始文本
- python - 如何使用 javascript 在 Django 中执行 Python 脚本
- javascript - 将多个参数从 Jquery 传递到 ASPX Web 方法不起作用