python - 如何使用 selenium 和 xpath 获取这些段落的文本
问题描述
我正在尝试抓取此网站,在此处输入链接描述。我们在每一页上都有近十个不同的机会。每个都有自己的标题和详细信息。我想得到所有这些信息。我编写了一个 python 代码,可以找到其他必需的标签和信息,但我找不到其中包含描述的段落。
这是我的代码。
base_url = "https://www.enabel.be/content/enabel-tenders"
driver.get(base_url)
WebDriverWait(driver , 10).until(EC.visibility_of_element_located(
(By.XPATH , "//*[@id='block-views-tenders-block']/div/div/div[@class='view-content']/div")))
current_page_tag = driver.find_element(By.XPATH ,
"//*[@id='block-views-tenders-block']/div/div/div[3]/ul/li[2]").text.strip()
all_divs = driver.find_elements(By.XPATH ,
"//*[@id='block-views-tenders-block']/div/div/div[@class ='view-content' "
"]/div")
for each_div in all_divs :
singleData = {
# could not detect
"language" : 107 ,
# means open
"status" : 0 ,
"op_link" : "" ,
"website" : website_name ,
"close_date" : '' ,
# means not available
"organization" : website_name ,
"description" : "" ,
"title" : '' ,
"checksum" : "" ,
# means not available
"country" : '' ,
"published_date" : ''
}
singleData['title'] = each_div.find_element(By.XPATH ,
".//span[@class='title-accr no-transform']").text.strip()
singleData['country'] = each_div.find_element(By.XPATH ,
".//div[1]/div/div/div[@class ='field-items']/div").text.strip()
close_date = each_div.find_element(By.XPATH , ".//div//div[1]/div").text.strip()
#description always returns me empty text.
description = each_div.find_element(By.XPATH, ".//div/div[2]/div[3]/div[2]/div/p").text.strip()
download = each_div.find_elements_by_xpath('.//div//div[2]/div[4]/div[2]//a')
download_file_link = []
for eachfile in download :
download_file_link.append(eachfile.get_attribute('href'))
我的代码可以获取标题、国家、截止日期及其附件,但无法获取描述部分。它返回给我一个空文本,但是当我在网站上看到它时,它里面有文本。
任何人都可以帮助我解决问题和解决方案。提前致谢
解决方案
如果它在那里,请尝试除捕获它。有一些
因此可能需要删除它。
for each_div in all_divs :
#description always returns me empty text.
try:
description = each_div.find_element(By.XPATH, ".//div[contains(text(),'Description')]/parent::div/div[2]//p[1]").get_attribute('innerHTML')
print(description)
except:
print('none')
输出
This is the annual publication of information on recipients of funds for the TVET Project.
none
At the latest 14 calendar days before the final date for receipt of tenders (up to 4th January 2021), tenderers may ask questions about the tender documents and the contract in accordance with Art. 64 of the Law of 17 June 2016. Questions shall be addressed in writing to:
Pour tout besoin d'information complémentaire, veuillez contacter: <a href="mailto:adama.dianda@enabel.be">adama.dianda@enabel.be</a>
none
none
none
Marché relatif à la fourniture, l’installation, la mise en marche et formation des utilisateurs et techniciens chargé de la maintenance des équipements de Laboratoire destinés au CERMES.
Pour tout besoin d'information complémentaire, veuillez contacter: <a href="mailto:adama.dianda@enabel.be">adama.dianda@enabel.be</a>
Tenders should request the price schedule in xls from Ms. Eva Matovu. email: <a href="mailto:eva.matovu@enabel.be">eva.matovu@enabel.be</a>
你可以使用
for each_div in all_divs :
#description always returns me empty text.
try:
description = each_div.find_elements(By.XPATH, ".//div[contains(text(),'Description')]/parent::div/div[2]//p")
for desc in description:
print(desc.get_attribute('textContent'))
except:
print('none')
输出
This is the annual publication of information on recipients of funds for the TVET Project.
At the latest 14 calendar days before the final date for receipt of tenders (up to 4th January 2021), tenderers may ask questions about the tender documents and the contract in accordance with Art. 64 of the Law of 17 June 2016. Questions shall be addressed in writing to:
Françoise MUSHIMIYIMANA, National Expert in Contractualization & Administration _National ECA (francoise.mushimiyimana@enabel.be ), with copy to
denise.nsanga@enabel.be
evariste.sibomana@enabel.be
They shall be answered in the order received. The complete overview of questions asked shall be available as of at the latest 7 calendar days before the final date for receipt of tenders at the address mentioned above.
Pour tout besoin d'information complémentaire, veuillez contacter: adama.dianda@enabel.be
Marché relatif à la fourniture, l’installation, la mise en marche et formation des utilisateurs et techniciens chargé de la maintenance des équipements de Laboratoire destinés au CERMES.
Pour tout besoin d'information complémentaire, veuillez contacter: adama.dianda@enabel.be
Tenders should request the price schedule in xls from Ms. Eva Matovu. email: eva.matovu@enabel.be
推荐阅读
- php - 我可以使用 macOS High Sierra 更新到最新版本的 PHP
- python - 如何使用 PANDAS 过滤掉一行?
- excel - Excel中的Sumifs与日期计算
- quire-api - 如何在 Quire 应用中接受用户配置?
- c# - 网络框架上的包引用不加载传递依赖项(未在输出中复制)
- swift - 如果按下按钮,则仅在 for 循环内继续
- reactjs - 在反应中通过主题提供者的材料 UI 多种字体
- python - 将列名中的字符串转换为 Pandas 中的 DateTime 对象
- date - 火花年龄计算间隔数据类型
- python-3.x - 通过文件夹中的上下文菜单打开 Powershell 时出现错误消息