python - Is there a (simple) way to calculate the percentage (physical) space occupied by an ad in a webpage using Python?
问题描述
The problem statement goes this way: Find the % physical occupancy of ads on a webpage.
Eg. Say I have a URL which when opened has its content and 3 ads - one is an image ad and the other 2 are 'image and text' ad. (I have been given many such URLs with an unknown number of ads). I count the number of ads based on the bin class that had 'ad' or 'sponsored' in it and so I know there are 3 ads on its page. Now, I need to find the occupancy of these ads as a percentage of the entire web page i.e., say all three ads together occupy 20% of the page. How do I do it?
I understand that elements don't render the same in different browsers and I actually do not care about that. I just need a rough percentage based on Chrome (or Firefox - anything is okay).
A similar question asked back in 2013 How to programmatically measure the elements' sizes in HTML source code using python? has only 2 solutions and not much information. I found the API for the suggested package Ghost (the one agreed to by the asker as helpful) pretty difficult to understand.
I was asked to 'render a website' using a headless browser without ads first and then with ads and find a difference. Problem is, I don't know how. I also am just hoping that in the last 8 years someone to have come up with a simpler solution to this problem.
Since I am new to using Python for "scraping" in this manner - if it can even be called "scraping" - I could use any resources/ideas/documentations that you might know of.
解决方案
我们可以使用方法计算所有元素的高度和宽度.size
。
xpath定位所有元素:
//*
然后我们可以计算广告、高度和宽度,因为它们是网络元素,我们可以使用相同的.size
方法。
下面的演示:
driver.maximize_window()
driver.implicitly_wait(30)
driver.get("https://stackoverflow.com/questions/68453828/is-there-a-simple-way-to-calculate-the-percentage-physical-space-occupied-by?noredirect=1#comment120979267_68453828")
wait = WebDriverWait(driver, 10)
width = []
height = []
for element in driver.find_elements(By.XPATH, "//*"):
size = element.size
w, h = size['width'], size['height']
width.append(w)
height.append(h)
total_width = sum(width)
total_height = sum(height)
print(total_width, total_height)
# Now calculate the width and heights of ads,
first_ad = wait.until(EC.visibility_of_element_located((By.XPATH, "//img")))
first_ad_size = first_ad.size
first_ad_w, first_ad_h = first_ad_size['width'], first_ad_size['height']
print(first_ad_w, first_ad_h)
total_page_area = total_width * total_height
print(total_page_area)
image_area = first_ad_w * first_ad_h
print(image_area)
percentage = (image_area * 100 )/total_page_area
print(percentage)
进口:
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
PS:我已经采取了first image as an ad
(我知道这并不理想,只是为了给 OP 一种实现此功能的方法)
如果您可以使用通用定位器(xpath、css)定位所有广告,它会变得更加容易。
推荐阅读
- python - 计算数据库中行号的函数只工作一次
- flutter - 如何在 Flutter 中的小部件内获取主题颜色
- shell - 通过拖放制作游戏中时光倒流 - 使用 ImageMagick 重建旧脚本
- ios - 如何将 UIImage 数组转换为 base64 数组
- firebase - 无法使用 Firebase 电话号码身份验证和 React Native 检索访问令牌
- hibernate - 在非拥有方检索集合时未使用 JPA 判别值
- reactjs - react-bootstrap 弹出框的问题
- java - 错误 java.util.NoSuchElementException 因为我无法创建 Mock the restTemplate
- powershell - 使用 Powershell 递归复制文件和文件夹
- json - 为我的传入 Webhook 输入 Teams ConversationID