首页 > 解决方案 > Extracting texts from

  • items with selenium in Python
  • 问题描述

    I´m trying to get the text inside a /a tag in a nested ul-li structure. I locate all the "li", but can´t get the text inside a's.

    I´m using Python 3.7 and Selenium webdriver with Firefox driver.

    The corresponding HTML is:

    [some HTML]
    
    <ul class="dropdown-menu inner">
    <!---->
        <li nya-bs-option="curso in ctrl.cursos group by curso.grupo" class="nya-bs-option first-in-group group-item">
            <span class="dropdown-header">Cursos em Destaque</span>
            <a tabindex="0">Important TEXT 1</a>
        </li>
        <!-- end nyaBsOption: curso in ctrl.cursos group by curso.grupo -->
        <li nya-bs-option="curso in ctrl.cursos group by curso.grupo" class="nya-bs-option group-item">
            <span class="dropdown-header">Cursos em Destaque</span>
            <a tabindex="0">Important TEXT 2</a>
        </li>
        <!-- end nyaBsOption: curso in ctrl.cursos group by curso.grupo -->
        <li nya-bs-option="curso in ctrl.cursos group by curso.grupo" class="nya-bs-option group-item">
            <span class="dropdown-header">Cursos em Destaque</span>
            <a tabindex="0">Important TEXT 3</a>
        </li>
        <!-- end nyaBsOption: curso in ctrl.cursos group by curso.grupo -->
        <li nya-bs-option="curso in ctrl.cursos group by curso.grupo" class="nya-bs-option group-item">
            <span class="dropdown-header">Cursos em Destaque</span>
            <a tabindex="0">Important TEXT4</a>
        </li>
                                [another 100 <li></li> similar blocks]                  .
                                                    .
        <li class="no-search-result" placeholder="Curso">
            <span>Unimportant TEXT</span>
        </li>
    </ul>
    
    [more HTML]
    

    I´ve tried the code below:

    cursos = browser.find_elements_by_xpath('//li[@nya-bs-option="curso in ctrl.cursos group by curso.grupo"]')
    nome_curso = [curso.find_element_by_tag_name('a').text for curso in cursos]
    

    I get the list with the correct number of items, but all of them = ''. Can anyone help me? Thks.

    标签: pythonselenium-webdriverxpathcss-selectorswebdriverwait

    解决方案


    Seems you were close. To extract the texts, e.g. Important TEXT 1, Important TEXT 2, Important TEXT 3, Important TEXT4, etc you have to induce WebDriverWait for the desired visibility_of_all_elements_located() and you can use either of the following Locator Strategies:

    • Using CSS_SELECTOR and get_attribute() method:

      print([my_elem.get_attribute("innerHTML") for my_elem in WebDriverWait(driver, 5).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "ul.dropdown-menu.inner li.nya-bs-option a")))])
      
    • Using XPATH and text attribute:

      print([my_elem.text for my_elem in WebDriverWait(driver, 5).until(EC.visibility_of_all_elements_located((By.XPATH, "//ul[@class='dropdown-menu inner']//li[contains(@class, 'nya-bs-option')]//a")))])
      
    • Note : You have to add the following imports :

      from selenium.webdriver.support.ui import WebDriverWait
      from selenium.webdriver.common.by import By
      from selenium.webdriver.support import expected_conditions as EC
      

    You can find a relevant discussion in How to retrieve the title attribute through Selenium using Python?


    Outro

    As per the documentation:


    推荐阅读