python - Selenium - 获取正文下一级元素的 XPath 列表

问题描述

我想直接在body网页标签下截取每个元素的屏幕截图

我已经编写了示例脚本 -

from selenium import webdriver
from PIL import Image
from io import BytesIO

fox = webdriver.Firefox()
fox.get('http://google.com/')

# get list of elements
elements = fox.find_elements_by_xpath("//html/body")

在此之后，我如何找到每个元素的 XPath 并截图？

给定单个元素的 XPath，我有脚本来截取屏幕截图

element = fox.find_element_by_xpath("//*[@id=\"hplogo\"]")
location = element.location
size = element.size
png = fox.get_screenshot_as_png() # saves screenshot of entire page
fox.quit()

im = Image.open(BytesIO(png)) # uses PIL library to open image in memory

left = location['x']
top = location['y']
right = (location['x'] + size['width'])
bottom = (location['y'] + size['height'])

im = im.crop((left, top, right, bottom)) # defines crop points
im.save('screenshot.png') # saves new cropped image

标签： pythonseleniumselenium-webdriver

您说要抓取元素的屏幕截图有点令人困惑

“直接在body标签下”

但是在第二个代码片段中，您想要获取'//*[@id=\"hplogo\"]'不是. 所以我的解决方案是基于一个假设，即您希望将“一些元素”保存为图像，但不一定只保存.bodybody

假设您想要获取所有元素并稍后处理过滤。只需抓住以下（不那么值得截图的）script元素之外的所有元素body：

elements = fox.find_elements_by_xpath('//html/body//*[not(self::script)]')

然后你可以在没有 PIL 的情况下保存它们，幸运的是 selenium 能够保存关于给定元素的屏幕截图：

# leading zeroes for filenames
padding = len(str(len(elements)))

for i, element in enumerate(elements):
    # you probably don't want a 0 byte screenshot or a try/except block
    if not(element.rect['height'] and element.rect['width']):
        continue
    # don't forget to specify your target dir
    with open(os.path.join(target_dir, f'{str(i).zfill(padding)}.png'), 'wb') as f:
        f.write(element.screenshot_as_png)

您可以通过仅保存非重复项来改进这一点，并且拥有正确的元素列表也可以节省大量时间。

python - Selenium - 获取正文下一级元素的 XPath 列表

问题描述

解决方案

推荐阅读