python - 使用 selenium 从网站中提取文本
问题描述
试图找到一种从好读物页面中提取书籍摘要的方法。尝试过美丽的汤/硒,不幸的是无济于事。
链接:https://www.goodreads.com/book/show/67896.Tao_Te_Ching?from_search=true&from_srp=true&qid=D19iQu7KWI&rank=1
代码:
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from bs4 import BeautifulSoup
import requests
link='https://www.goodreads.com/book/show/67896.Tao_Te_Ching?from_search=true&from_srp=true&qid=D19iQu7KWI&rank=1'
driver.get(link)
Description=driver.find_element_by_xpath("//div[contains(text(),'TextContainer')]")
#first TextContainer contains the sumary of the book
book_page = requests.get(link)
soup = BeautifulSoup(book_page.text, "html.parser")
print(soup)
Container = soup.find('class', class_='leftContainer')
print(Container)
错误:
容器是空的 +
NoSuchElementException:没有这样的元素:无法找到元素:{"method":"xpath","selector":"//div[contains(text(),'TextContainer')]"} (会话信息:chrome=83.0. 4103.116)
解决方案
你可以像这样得到描述
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
...
driver.get("https://www.goodreads.com/book/show/67896.Tao_Te_Ching?from_search=true&from_srp=true&qid=D19iQu7KWI&rank=1")
description = WebDriverWait(driver, 10).until(
EC.presence_of_element_located((By.CSS_SELECTOR, 'div#description span[style="display:none"]'))
)
print(description.get_attribute('textContent'))
推荐阅读
- javascript - 如何在 mousemove 事件中确定鼠标的方向(左上、右上、左下和右下)
- html - 仅在移动设备上居中对齐文本
- html - 在道具内渲染 HTML
- apache - 如何仅为本地主机禁用 apache 日志
- c# - 有没有办法在 Controller(ODataController) 中制作通用操作方法?
- ethernet - ifup eth0 已配置
- cordova-plugins - Ionic4 相机插件-cordova 在设备上不可用?
- django - Django,检查是否存在,但在属性数组中的许多项目上
- c# - 如何从数据网格中绘制过滤值?
- java - 从方法在xml中创建bean