python - Python:从 XPath 获取元素值
问题描述
我对 xpaths 和网络抓取真的很陌生,所以如果这是一个相对较小的问题,我很抱歉。我正在尝试抓取多个网站,以确保更新数据库中的数据。我能够获取部分字符串的 xPath,但不确定如何使用 xPath 获取完整值。
代码:
def xpath_soup(element):
components = []
child = element if element.name else element.parent
for parent in child.parents:
previous = itertools.islice(parent.children, 0,parent.contents.index(child))
xpath_tag = child.name
xpath_index = sum(1 for i in previous if i.name == xpath_tag) + 1
components.append(xpath_tag if xpath_index == 1 else '%s[%d]' % (xpath_tag, xpath_index))
child = parent
components.reverse()
return '/%s' % '/'.join(components)
page = requests.get("https://www.gaumard.com/obstetricmr")
html = str(BeautifulSoup(page.content, 'html.parser'))
soup = BeautifulSoup(html, 'lxml')
elem = soup.find(string=re.compile('xt-generation mixed reality training solution for VICTORIA® S2200 designed to help learners bridge the gap between theory and practice'))
xPathValue = xpath_soup(elem)
print(xPathValue)
我正在尝试使用xPathValue
.
预期结果将是完整版
xt-generation mixed reality training solution for VICTORIA® S2200 designed to help learners bridge the gap between theory and practice
存在
Obstetric MR™ is a next-generation mixed reality training solution for VICTORIA® S2200 designed to help learners bridge the gap between theory and practice faster than ever before. Using the latest technology in holographic visualization, Obstetric MR brings digital learning content into the physical simulation exercise, allowing participants to link knowledge and skill through an entirely new hands-on training experience. The future of labor and delivery simulation is here.
这个全部价值将来自利用xPathValue
。
解决方案
以下是如何使用XPath
.
import requests
from lxml import html
page = requests.get("https://www.gaumard.com/obstetricmr").text
text = html.fromstring(page).xpath('//*[@style="margin: 0 auto;"][2]/div/text()')
print(text[0].strip())
输出:
Obstetric MR™ is a next-generation mixed reality training solution for VICTORIA® S2200 designed to help learners bridge the gap between theory and practice faster than ever before. Using the latest technology in holographic visualization, Obstetric MR brings digital learning content into the physical simulation exercise, allowing participants to link knowledge and skill through an entirely new hands-on training experience. The future of labor and delivery simulation is here.
推荐阅读
- flutter - 配置根项目“file_picker”时出现问题。未找到 SDK 位置
- sql - 在sql中一次连接多个表
- flutter - Image.file 未显示图像,链接出现在左上角,带有选择文件按钮
- visual-studio - Windows 上的 Visual Studio 是否有 pwd.h 端口?
- sharepoint - SharePoint 多行文本列和网站设计
- java - 正则表达式以任何顺序匹配组
- c# - OpenTK 4:这是如何设置各向异性过滤?
- swift - 如何在swift中调用另一个文件中的getJSON函数
- gradle - 使用不幸的标签名称从 Gradle 调用 Ant
- python - 在按钮命令中修改没有全局变量的lambda函数中传递的变量