首页 > 解决方案 > 如何在 Python 中使用 Selenium 打印打开的 pdf 链接?

问题描述

我无法打印运行给定代码后打开的最终 pdf 的链接

from selenium import webdriver
from selenium.webdriver.support import ui
from selenium.webdriver.common.keys import Keys
from selenium.common.exceptions import NoSuchElementException 

def page_is_loaded(driver):
    return driver.find_element_by_tag_name("body")!= None


def check_exists_by_text(text):
    try:
        driver.find_element_by_link_text(text)
    except NoSuchElementException:
        return False
    return True

driver = webdriver.Chrome("C:/Users/Roshan/Desktop/sbi/chromedriver")
driver.maximize_window()
driver.get("http://www.careratings.com/brief-rationale.aspx")

wait = ui.WebDriverWait(driver,10)
wait.until(page_is_loaded)

location_field = driver.find_element_by_name("txtfromdate")
location_field.send_keys("2019-05-06")

last_date = driver.find_element_by_name("txttodate")
last_date.send_keys("2019-05-21")

driver.find_element_by_xpath("//input[@name='btn_submit']").click()

if check_exists_by_text('Reliance Capital Limited'):
    elm =driver.find_element_by_link_text('Reliance Capital Limited')
    driver.implicitly_wait(5)
    elm.click()
    driver.implicitly_wait(50)
    #time.sleep(5)
    #driver.quit()
else :
    print("Company is not rated in the given Date range")

我期待实际输出是这个 pdf 的链接:

http://www.carratings.com/upload/CompanyFiles/PR/Reliance%20Capital%20Ltd.-05-18-2019.pdf

但我不知道如何打印此链接

标签: pythonseleniumweb-scrapingbeautifulsoupselenium-chromedriver

解决方案


您需要找到表中的所有元素,然后从中提取数据。

from selenium import webdriver
import os

# setup path to chrome driver
chrome_driver = os.getcwd() + '/chromedriver'
# initialise chrome driver
browser = webdriver.Chrome(chrome_driver)
# load url
browser.get('http://www.careratings.com/brief-rationale.aspx')

# setup date range
location_field = browser.find_element_by_name("txtfromdate")
location_field.send_keys("2019-05-06")
last_date = browser.find_element_by_name("txttodate")
last_date.send_keys("2019-05-21")
browser.find_element_by_xpath("//input[@name='btn_submit']").click()

# get all data rows
content = browser.find_elements_by_xpath('//*[@id="divManagementSpeak"]/table/tbody/tr/td/a')

# get text and href link from each element
collected_data = []
for item in content:
    url = item.get_attribute("href")
    description = item.get_attribute("innerText")
    collected_data.append((url, description ))

输出:

('http://www.careratings.com/upload/CompanyFiles/PR/Ashwini%20Frozen%20Foods-05-21-2019.pdf', 'Ashwini Frozen Foods')
('http://www.careratings.com/upload/CompanyFiles/PR/Vanita%20Cold%20Storage-05-21-2019.pdf', 'Vanita Cold Storage') 

等等


推荐阅读