python - Extracting texts from
问题描述
I´m trying to get the text inside a /a tag in a nested ul-li structure. I locate all the "li", but can´t get the text inside a's.
I´m using Python 3.7 and Selenium webdriver with Firefox driver.
The corresponding HTML is:
[some HTML]
<ul class="dropdown-menu inner">
<!---->
<li nya-bs-option="curso in ctrl.cursos group by curso.grupo" class="nya-bs-option first-in-group group-item">
<span class="dropdown-header">Cursos em Destaque</span>
<a tabindex="0">Important TEXT 1</a>
</li>
<!-- end nyaBsOption: curso in ctrl.cursos group by curso.grupo -->
<li nya-bs-option="curso in ctrl.cursos group by curso.grupo" class="nya-bs-option group-item">
<span class="dropdown-header">Cursos em Destaque</span>
<a tabindex="0">Important TEXT 2</a>
</li>
<!-- end nyaBsOption: curso in ctrl.cursos group by curso.grupo -->
<li nya-bs-option="curso in ctrl.cursos group by curso.grupo" class="nya-bs-option group-item">
<span class="dropdown-header">Cursos em Destaque</span>
<a tabindex="0">Important TEXT 3</a>
</li>
<!-- end nyaBsOption: curso in ctrl.cursos group by curso.grupo -->
<li nya-bs-option="curso in ctrl.cursos group by curso.grupo" class="nya-bs-option group-item">
<span class="dropdown-header">Cursos em Destaque</span>
<a tabindex="0">Important TEXT4</a>
</li>
[another 100 <li></li> similar blocks] .
.
<li class="no-search-result" placeholder="Curso">
<span>Unimportant TEXT</span>
</li>
</ul>
[more HTML]
I´ve tried the code below:
cursos = browser.find_elements_by_xpath('//li[@nya-bs-option="curso in ctrl.cursos group by curso.grupo"]')
nome_curso = [curso.find_element_by_tag_name('a').text for curso in cursos]
I get the list with the correct number of items, but all of them = ''. Can anyone help me? Thks.
解决方案
Seems you were close. To extract the texts, e.g. Important TEXT 1, Important TEXT 2, Important TEXT 3, Important TEXT4, etc you have to induce WebDriverWait for the desired visibility_of_all_elements_located()
and you can use either of the following Locator Strategies:
Using
CSS_SELECTOR
andget_attribute()
method:print([my_elem.get_attribute("innerHTML") for my_elem in WebDriverWait(driver, 5).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "ul.dropdown-menu.inner li.nya-bs-option a")))])
Using
XPATH
andtext
attribute:print([my_elem.text for my_elem in WebDriverWait(driver, 5).until(EC.visibility_of_all_elements_located((By.XPATH, "//ul[@class='dropdown-menu inner']//li[contains(@class, 'nya-bs-option')]//a")))])
Note : You have to add the following imports :
from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.common.by import By from selenium.webdriver.support import expected_conditions as EC
You can find a relevant discussion in How to retrieve the title attribute through Selenium using Python?
Outro
As per the documentation:
get_attribute()
methodGets the given attribute or property of the element.
text
attribute returnsThe text of the element.
- Difference between text and innerHTML using Selenium
推荐阅读
- javascript - 我能否可靠地预测,对于单个组件,`useEffect` 回调在可以执行时以自上而下的顺序执行?
- git - git merge squash 没有做我期望它做的事情
- arrays - 过滤 2D-Array 并切片为 1D
- sql - ORACLE JSON_TABLE 我需要从数组中获取 2 列
- javascript - 如何处理带有条件字段的 javascript 表单中的安全性
- kotlin - 提供的插件 com.bnorm.power.PowerAssertComponentRegistrar 与此版本的编译器不兼容
- c# - 向 Discord 频道发送消息
- azure-active-directory - Microsoft Graph Explorer 无法同意 Directory.AccessAsUser.All 添加目录架构扩展属性
- ubuntu - Freeradius 状态:正在激活,无法启动
- python - 找出命名 Python 字符串格式中的可用变量