首页 > 解决方案 > Selenium 返回错误的元素,选择第一个兄弟元素而不是查看元素本身

问题描述

我正在尝试遍历元素列表并打印文本,但是当我在另一个元素内部选择一个元素时,selenium 返回第一个兄弟元素内部的元素,而不是我真正感兴趣的元素内部的元素其中只是,令人难以置信的奇怪和令人沮丧。 https://www.thecompleteuniversityguide.co.uk/courses/details/computing-science-with-a-year-in-industry-bsc/54983514 这是我试图从中获取的网站,我正在寻找在模块部分。我的代码的关键部分:

import time
from selenium.webdriver import Chrome
from selenium.webdriver.chrome.options import Options

opts = Options()
opts.add_argument('--headless')

driver = Chrome(executable_path = 'D:\Programs\Python\chromedriver.exe', options = opts)

driver.get("https://www.thecompleteuniversityguide.co.uk/courses/details/computing-science-with-a-year-in-industry-bsc/54983514")

closeButton = driver.find_element_by_xpath("//a[@id='closeFilter']")
closeButton.click()
driver.find_element_by_xpath("//a[@id='acceptCookie']").click()

modules_container = driver.find_element_by_xpath("//div[@data-sub-sec='Modules']").find_element_by_class_name("cdsb_rt")
numberOfModulesByYear = len(modules_container.find_elements_by_xpath("//div[@class='mdldv']"))
previousNumberOfModules = 0

for moduleYear in range(numberOfModulesByYear):
      moduleYearButtonString = "//div[@class='mdldv' and @data-module-sections='{}']".format(str(moduleYear))
      module_year = modules_container.find_element_by_xpath(moduleYearButtonString)
      module_year_a = module_year.find_element_by_tag_name("a")
      time.sleep(0.5)
      while module_year_a.find_element_by_tag_name("span").get_attribute("class") == "icon icon-add": 
            module_year_a.click()
      while len(module_year.find_elements_by_xpath("//div[@class='mdiv']")) - previousNumberOfModules == 0:
            time.sleep(0.01)
      listOfModules = module_year.find_elements_by_xpath("//div[@class='mdiv']")
      previousNumberOfModules = len(module_year.find_elements_by_xpath("//div[@class='mdiv']"))
      for _, module in enumerate(listOfModules):
            print(module.find_element_by_tag_name("a").find_element_by_xpath("//span[@class='mdltxt']").get_attribute("outerHTML"))
      print("\n")

我得到的输出是:

<span class="mdltxt">Programming 1 (20 credits) - Core</span>
<span class="mdltxt">Programming 1 (20 credits) - Core</span>
<span class="mdltxt">Programming 1 (20 credits) - Core</span>
<span class="mdltxt">Programming 1 (20 credits) - Core</span>
<span class="mdltxt">Programming 1 (20 credits) - Core</span>


<span class="mdltxt">Programming 1 (20 credits) - Core</span>
<span class="mdltxt">Programming 1 (20 credits) - Core</span>
<span class="mdltxt">Programming 1 (20 credits) - Core</span>
<span class="mdltxt">Programming 1 (20 credits) - Core</span>
<span class="mdltxt">Programming 1 (20 credits) - Core</span>
<span class="mdltxt">Programming 1 (20 credits) - Core</span>
<span class="mdltxt">Programming 1 (20 credits) - Core</span>
<span class="mdltxt">Programming 1 (20 credits) - Core</span>
<span class="mdltxt">Programming 1 (20 credits) - Core</span>
<span class="mdltxt">Programming 1 (20 credits) - Core</span>


<span class="mdltxt">Programming 1 (20 credits) - Core</span>
<span class="mdltxt">Programming 1 (20 credits) - Core</span>
<span class="mdltxt">Programming 1 (20 credits) - Core</span>
<span class="mdltxt">Programming 1 (20 credits) - Core</span>
<span class="mdltxt">Programming 1 (20 credits) - Core</span>
<span class="mdltxt">Programming 1 (20 credits) - Core</span>
<span class="mdltxt">Programming 1 (20 credits) - Core</span>
<span class="mdltxt">Programming 1 (20 credits) - Core</span>
<span class="mdltxt">Programming 1 (20 credits) - Core</span>
<span class="mdltxt">Programming 1 (20 credits) - Core</span>
<span class="mdltxt">Programming 1 (20 credits) - Core</span>
<span class="mdltxt">Programming 1 (20 credits) - Core</span>
<span class="mdltxt">Programming 1 (20 credits) - Core</span>
<span class="mdltxt">Programming 1 (20 credits) - Core</span>


<span class="mdltxt">Programming 1 (20 credits) - Core</span>
<span class="mdltxt">Programming 1 (20 credits) - Core</span>
<span class="mdltxt">Programming 1 (20 credits) - Core</span>
<span class="mdltxt">Programming 1 (20 credits) - Core</span>
<span class="mdltxt">Programming 1 (20 credits) - Core</span>
<span class="mdltxt">Programming 1 (20 credits) - Core</span>
<span class="mdltxt">Programming 1 (20 credits) - Core</span>
<span class="mdltxt">Programming 1 (20 credits) - Core</span>
<span class="mdltxt">Programming 1 (20 credits) - Core</span>
<span class="mdltxt">Programming 1 (20 credits) - Core</span>
<span class="mdltxt">Programming 1 (20 credits) - Core</span>
<span class="mdltxt">Programming 1 (20 credits) - Core</span>
<span class="mdltxt">Programming 1 (20 credits) - Core</span>
<span class="mdltxt">Programming 1 (20 credits) - Core</span>
<span class="mdltxt">Programming 1 (20 credits) - Core</span>
<span class="mdltxt">Programming 1 (20 credits) - Core</span>

这对我来说没有任何意义吗?当我检查 a 元素 HTML 时,它显示正确的名称,但是当我尝试通过 xpath 函数访问它时,它返回错误的名称?谁能帮助弄清楚为什么会发生这种情况?如果这是预期的行为,这似乎非常不直观。

编辑:对于将来可能阅读此内容的任何人,我对 xpath 进行了更多研究,并且在查看了解释这一点的网站之后,如果您想查看当前节点,并且查看当前节点子元素,请使用 xpath 开始".//",句号表示它只会查看该元素,而 // 表示它是相对的(或者我相信)不是 xpath 问题,只是一个简单的格式问题,对于这种东西的新手来说可能会很可怕。祝所有这样做的人好运!

说明:XPath 中的 .// 和 //* 有什么区别?

标签: pythonhtmlseleniumselenium-webdriverweb-scraping

解决方案


这似乎是相对 xpath 的问题?我不太确定。但是当我使用类名来查找它工作的元素时:

print(module.find_element_by_tag_name("a").find_element_by_class_name('mdltxt').get_attribute("outerHTML"))
<span class="mdltxt">Programming 1 (20 credits) - Core</span>
<span class="mdltxt">Database Systems (20 credits) - Core</span>
<span class="mdltxt">Web-Based Programming (20 credits) - Core</span>
<span class="mdltxt">Systems Development (20 credits) - Core</span>
<span class="mdltxt">Computing Principles (20 credits) - Core</span>
<span class="mdltxt">Programming 1 (20 credits) - Core</span>
<span class="mdltxt">Database Systems (20 credits) - Core</span>
<span class="mdltxt">Web-Based Programming (20 credits) - Core</span>
<span class="mdltxt">Systems Development (20 credits) - Core</span>
<span class="mdltxt">Computing Principles (20 credits) - Core</span>
<span class="mdltxt">Software Engineering 1 (20 credits) - Core</span>
<span class="mdltxt">Programming 2 (20 credits) - Core</span>
<span class="mdltxt">Architectures and Operating Systems (20 credits) - Core</span>
<span class="mdltxt">Data Structures and Algorithms (20 credits) - Core</span>
<span class="mdltxt">Year in Industry (80 credits) - Core</span>
<span class="mdltxt">Industrial Project Report (40 credits) - Core</span>

推荐阅读