python - 如何使用 selenium 和 xpath 获取这些段落的文本

问题描述

我正在尝试抓取此网站，在此处输入链接描述。我们在每一页上都有近十个不同的机会。每个都有自己的标题和详细信息。我想得到所有这些信息。我编写了一个 python 代码，可以找到其他必需的标签和信息，但我找不到其中包含描述的段落。

这是我的代码。

    base_url = "https://www.enabel.be/content/enabel-tenders"
    driver.get(base_url)
    WebDriverWait(driver , 10).until(EC.visibility_of_element_located(
            (By.XPATH , "//*[@id='block-views-tenders-block']/div/div/div[@class='view-content']/div")))

    current_page_tag = driver.find_element(By.XPATH ,
                                               "//*[@id='block-views-tenders-block']/div/div/div[3]/ul/li[2]").text.strip()
    all_divs = driver.find_elements(By.XPATH ,
                                        "//*[@id='block-views-tenders-block']/div/div/div[@class ='view-content' "
                                        "]/div")


      for each_div in all_divs :

            singleData = {
                # could not detect
                "language" : 107 ,
                # means open
                "status" : 0 ,
                "op_link" : "" ,
                "website" : website_name ,
                "close_date" : '' ,
                # means not available
                "organization" : website_name ,
                "description" : "" ,
                "title" : '' ,
                "checksum" : "" ,
                # means not available
                "country" : '' ,
                "published_date" : ''
            }

            singleData['title'] = each_div.find_element(By.XPATH ,
                                                        ".//span[@class='title-accr no-transform']").text.strip()
    
                singleData['country'] = each_div.find_element(By.XPATH ,
                                                              ".//div[1]/div/div/div[@class ='field-items']/div").text.strip()
                close_date = each_div.find_element(By.XPATH , ".//div//div[1]/div").text.strip()
    
                 #description always returns me empty text.
                description = each_div.find_element(By.XPATH, ".//div/div[2]/div[3]/div[2]/div/p").text.strip()
                download = each_div.find_elements_by_xpath('.//div//div[2]/div[4]/div[2]//a')
                download_file_link = []
                for eachfile in download :
                    download_file_link.append(eachfile.get_attribute('href'))

我的代码可以获取标题、国家、截止日期及其附件，但无法获取描述部分。它返回给我一个空文本，但是当我在网站上看到它时，它里面有文本。

任何人都可以帮助我解决问题和解决方案。提前致谢

标签： pythonseleniumxpathweb-scraping

解决方案

如果它在那里，请尝试除捕获它。有一些 因此可能需要删除它。

for each_div in all_divs :
     #description always returns me empty text.
    try:
        description = each_div.find_element(By.XPATH, ".//div[contains(text(),'Description')]/parent::div/div[2]//p[1]").get_attribute('innerHTML')
        print(description)
    except:
        print('none')

输出

This is the annual publication of information on recipients of funds for the&nbsp;TVET Project.&nbsp;
none
At the latest 14 calendar days before the final date for receipt of tenders (up to 4th January 2021), tenderers may ask questions about the tender documents and the contract in accordance with Art. 64 of the Law of 17 June 2016. Questions shall be addressed in writing to:
Pour tout besoin d'information complémentaire, veuillez contacter: <a href="mailto:adama.dianda@enabel.be">adama.dianda@enabel.be</a>
none
none
none
Marché relatif &nbsp;à &nbsp;la&nbsp;fourniture, &nbsp;l’installation, &nbsp;la &nbsp;mise &nbsp;en &nbsp;marche &nbsp;et&nbsp;formation des utilisateurs et techniciens chargé de la&nbsp;maintenance &nbsp;des &nbsp;équipements &nbsp;de &nbsp;Laboratoire&nbsp;destinés au CERMES.&nbsp;
Pour tout besoin d'information complémentaire, veuillez contacter: <a href="mailto:adama.dianda@enabel.be">adama.dianda@enabel.be</a>
Tenders should request the price schedule in xls from Ms. Eva Matovu. email: <a href="mailto:eva.matovu@enabel.be">eva.matovu@enabel.be</a>

你可以使用

for each_div in all_divs :
     #description always returns me empty text.
    try:
        description = each_div.find_elements(By.XPATH, ".//div[contains(text(),'Description')]/parent::div/div[2]//p")
        for desc in description:
            print(desc.get_attribute('textContent'))
    except:
        print('none')

输出

This is the annual publication of information on recipients of funds for the TVET Project.
At the latest 14 calendar days before the final date for receipt of tenders (up to 4th January 2021), tenderers may ask questions about the tender documents and the contract in accordance with Art. 64 of the Law of 17 June 2016. Questions shall be addressed in writing to:
Françoise MUSHIMIYIMANA, National Expert in Contractualization & Administration _National ECA                                    (francoise.mushimiyimana@enabel.be ), with copy to
denise.nsanga@enabel.be
evariste.sibomana@enabel.be

They shall be answered in the order received. The complete overview of questions asked shall be available as of at the latest 7 calendar days before the final date for receipt of tenders at the address mentioned above.
Pour tout besoin d'information complémentaire, veuillez contacter: adama.dianda@enabel.be
Marché relatif  à  la fourniture,  l’installation,  la  mise  en  marche  et formation des utilisateurs et techniciens chargé de la maintenance  des  équipements  de  Laboratoire destinés au CERMES.
Pour tout besoin d'information complémentaire, veuillez contacter: adama.dianda@enabel.be
Tenders should request the price schedule in xls from Ms. Eva Matovu. email: eva.matovu@enabel.be

python - 如何使用 selenium 和 xpath 获取这些段落的文本

问题描述

解决方案

推荐阅读