首页 > 解决方案 > Selenium Web 驱动程序:如何从元素中获取 url?

问题描述

使用库 Selenium/Splinter 并尝试从每个元素获取 URL 以从 wellsfargo 下载 pdf 语句。抓取表格时,它会提供 pdf 的链接——希望单击每个链接,然后以某种方式将它们下载到计算机上的某个位置。

    import selenium
    from splinter import Browser
    import time
    from selenium import webdriver
    from selenium.webdriver.common.keys import Keys
    from selenium.webdriver.common.action_chains import ActionChains
    from selenium.webdriver.common.by import By

    driver = webdriver.Chrome('actual_path')
    driver.get('https://www.wellsfargo.com/')
    driver.delete_all_cookies

    mainurl = "https://www.wellsfargo.com/"

    # login function - working 
    username = driver.find_element_by_id("userid")
    username.send_keys("actual_username") 
    passy = driver.find_element_by_id("password")
    passy.send_keys("actual_password") 
    submitbutton = driver.find_element_by_xpath("""//*[@id="frmSignon"]/div[5]""")

driver.find_element_by_xpath('/html/body/div[3]/section/div[1]/div[3]/div[1]/div/div[1]/a[1]').click()
    driver.implicitly_wait(sleeptime)
    driver.find_element_by_link_text('View Statements').click()

    ################## NEED HELP -TO SAVE PDF ELEMENTS AND DOWNLOAD #############
    elem = driver.find_elements_by_class_name("document-title")

    counttotal = 0

    for pdf in elem: 
          counttotal = counttotal + 1 
          elem[counttotal].click()
          driver.back()

当尝试在 elem 中为 i 打印时: print(i) - 它打印元素但不打印 url 链接,有没有办法从这个元素获取链接?

# Sample Doc To Click & Download 

<div class="documents"><div data-message-container="stmtdiscMessages"><!------------   Error messages -----------------><!-----------  Account messages ---------------></div><h3>Statements</h3><p>Deposit account statements are available online for up to 7 years.</p><div class="document large"><div class="document-details account-introtext"> <a role="link" tabindex="0" data-pdf="true" data-url="https://connect.secure.wellsfargo.com/edocs/documents/retrieve/34278aaf-8f37-43de-7d8e-e368124d5f62?_x=gTHPa3PEVAvnSu-uI5vThRyJCGUu-2f4" class="document-title" style="touch-action: auto;">Statement 08/31/19 (21K, PDF)</a></div></div><div class="document large">

#document number 2 
<div class="document-details account-introtext"> <a role="link" tabindex="0" data-pdf="true" data-url="https://connect.secure.wellsfargo.com/edocs/documents/retrieve/9efe2b61-8233-8s65-2738-677ef63291f7?_x=h8i20NifIc9dRVCvj9I8pkic0S80i" class="document-title" style="touch-action: auto;">Statement 07/31/19 (21K, PDF)</a></div></div><div class="document large">

#document number 3, etc. 
<div class="document-details account-introtext"> <a role="link" tabindex="0" data-pdf="true" data-url="https://connect.secure.wellsfargo.com/edocs/documents/retrieve/7eece2e7-e27e-4445-8s4d-fa5899c5c96b?_x=037X7K-IdhVOVevUISRnQT74qL793tIW" class="document-title" style="touch-action: auto;">Statement 06/30/19 (24K, PDF)</a></div></div><div class="document large">

标签: pythonpython-3.xpython-requests

解决方案


您可以使用get_attribute函数从元素中检索任何属性:

    elements = driver.find_elements_by_class_name("document-title")

    pdf_urls = []
    for element in elements: 
        pdf_urls.append(element.get_attribute('data-url'))

或者,如果您习惯于列出理解,这里有一种更 Pythonic 的方式:

    elements = driver.find_elements_by_class_name("document-title")

    pdf_urls = [element.get_attribute('data-url') for element in elements]

推荐阅读