python - Selenium Web 驱动程序:如何从元素中获取 url?
问题描述
使用库 Selenium/Splinter 并尝试从每个元素获取 URL 以从 wellsfargo 下载 pdf 语句。抓取表格时,它会提供 pdf 的链接——希望单击每个链接,然后以某种方式将它们下载到计算机上的某个位置。
import selenium
from splinter import Browser
import time
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.action_chains import ActionChains
from selenium.webdriver.common.by import By
driver = webdriver.Chrome('actual_path')
driver.get('https://www.wellsfargo.com/')
driver.delete_all_cookies
mainurl = "https://www.wellsfargo.com/"
# login function - working
username = driver.find_element_by_id("userid")
username.send_keys("actual_username")
passy = driver.find_element_by_id("password")
passy.send_keys("actual_password")
submitbutton = driver.find_element_by_xpath("""//*[@id="frmSignon"]/div[5]""")
driver.find_element_by_xpath('/html/body/div[3]/section/div[1]/div[3]/div[1]/div/div[1]/a[1]').click()
driver.implicitly_wait(sleeptime)
driver.find_element_by_link_text('View Statements').click()
################## NEED HELP -TO SAVE PDF ELEMENTS AND DOWNLOAD #############
elem = driver.find_elements_by_class_name("document-title")
counttotal = 0
for pdf in elem:
counttotal = counttotal + 1
elem[counttotal].click()
driver.back()
当尝试在 elem 中为 i 打印时: print(i) - 它打印元素但不打印 url 链接,有没有办法从这个元素获取链接?
# Sample Doc To Click & Download
<div class="documents"><div data-message-container="stmtdiscMessages"><!------------ Error messages -----------------><!----------- Account messages ---------------></div><h3>Statements</h3><p>Deposit account statements are available online for up to 7 years.</p><div class="document large"><div class="document-details account-introtext"> <a role="link" tabindex="0" data-pdf="true" data-url="https://connect.secure.wellsfargo.com/edocs/documents/retrieve/34278aaf-8f37-43de-7d8e-e368124d5f62?_x=gTHPa3PEVAvnSu-uI5vThRyJCGUu-2f4" class="document-title" style="touch-action: auto;">Statement 08/31/19 (21K, PDF)</a></div></div><div class="document large">
#document number 2
<div class="document-details account-introtext"> <a role="link" tabindex="0" data-pdf="true" data-url="https://connect.secure.wellsfargo.com/edocs/documents/retrieve/9efe2b61-8233-8s65-2738-677ef63291f7?_x=h8i20NifIc9dRVCvj9I8pkic0S80i" class="document-title" style="touch-action: auto;">Statement 07/31/19 (21K, PDF)</a></div></div><div class="document large">
#document number 3, etc.
<div class="document-details account-introtext"> <a role="link" tabindex="0" data-pdf="true" data-url="https://connect.secure.wellsfargo.com/edocs/documents/retrieve/7eece2e7-e27e-4445-8s4d-fa5899c5c96b?_x=037X7K-IdhVOVevUISRnQT74qL793tIW" class="document-title" style="touch-action: auto;">Statement 06/30/19 (24K, PDF)</a></div></div><div class="document large">
解决方案
您可以使用get_attribute函数从元素中检索任何属性:
elements = driver.find_elements_by_class_name("document-title")
pdf_urls = []
for element in elements:
pdf_urls.append(element.get_attribute('data-url'))
或者,如果您习惯于列出理解,这里有一种更 Pythonic 的方式:
elements = driver.find_elements_by_class_name("document-title")
pdf_urls = [element.get_attribute('data-url') for element in elements]
推荐阅读
- javascript - 如何在没有构造函数的情况下为“this”赋值
- node.js - 我无法安装 @angular/CLI 包
- bash - 如果 - 否则 - Bash
- javascript - 无法使用 javascript 发布 json 数据
- android - 使用 Firebase AuthUI 检查用户是否是第一次登录
- java - 如何强制 LibGDX 在 Android 上显示分辨率?
- javascript - 即使数组已完成加载,我的数组中的某些值在我的函数中也未定义
- java - netbeans - java 模块
- d3.js - 选项卡内的 NVD3 宽度错误
- java - 抽象数据类型和 Json 映射