首页 > 解决方案 > 如何使用 XPath 打印 href/URL?

问题描述

我的代码导航到一个网站,并且在该网站中有一篇包含其自己的链接/url/href 的文章。

我想打印这个字段。

我当前的代码突出显示了它所在的容器,然后我尝试执行一个 for 循环来获取 href。

from selenium import webdriver
driver = webdriver.Chrome()
import time

url = 'https://library.ehaweb.org/eha/#!*menu=6*browseby=8*sortby=2*media=3*ce_id=2035*label=21986*ot_id=25553*marker=1283*featured=17286'
driver.get(url)
time.sleep(3)
page_source = driver.page_source

container=driver.find_element_by_xpath("//div[@class='list-box col-md-6 col-lg-6 col-xl-4 test']")
for j in container:
    link= j.find_element_by_css_selector('a').get_attribute('href')
    print(link)

标签: pythonseleniumweb-scrapingxpath

解决方案


If I correctly understand what you want, you just need to print element's child (a) attribute:

link = driver.find_element_by_xpath("//div[@class='list-box col-md-6 col-lg-6 col-xl-4 test']/a").get_attribute("href")
print(link)

This prints:

https://library.ehaweb.org/eha/2021/eha2021-virtual-congress/324511/hanny.al-samkari.pazopanib.for.severe.bleeding.and.transfusion-dependent.html?f=menu%3D6%2Abrowseby%3D8%2Asortby%3D2%2Amedia%3D3%2Ace_id%3D2035%2Alabel%3D21986%2Aot_id%3D25553%2Amarker%3D1283%2Afeatured%3D17286

If you want to use loop, then change container=driver.find_element_by_xpath("//div[@class='list-box col-md-6 col-lg-6 col-xl-4 test']") to

container=driver.find_elements_by_xpath("//div[@class='list-box col-md-6 col-lg-6 col-xl-4 test']")

For exactly this element the following locator would be enough:

//div[contains(@class, 'test')]/a

With the following code:

driver = webdriver.Chrome(executable_path='/snap/bin/chromium.chromedriver')
url = 'https://library.ehaweb.org/eha/#!*menu=6*browseby=8*sortby=2*media=3*ce_id=2035*label=21986*ot_id=25553*marker=1283*featured=17286'
driver.get(url)
driver.implicitly_wait(10)
container = driver.find_elements_by_xpath("//div[contains(@class, 'test')]")
for j in container:
    link = j.find_element_by_css_selector('a').get_attribute('href')
    print(link)
driver.close()

推荐阅读