python - 检查 Selenium 是否基于 Web 元素滚动的条件?
问题描述
目前,我有一个脚本将转到TripAdvisor并尝试抓取该特定过滤器中的每张图像。我想知道我应该将 if 语句设置为什么条件,以使其脱离 while 循环,然后解析 url 列表,以便为我提供每个图像的清晰 url 链接。一旦我到达最后一个网络元素,我只是对如何判断我是否已经到达终点感到困惑。if 语句就在最后一个打印循环之前的末尾。任何帮助是极大的赞赏!
# import dependencies
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.wait import WebDriverWait
import re
import selenium
import io
import pandas as pd
import urllib.request
import urllib.parse
import requests
from bs4 import BeautifulSoup
import pandas as pd
from selenium.webdriver.common.action_chains import ActionChains
from selenium import webdriver
import time
from _datetime import datetime
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
options = webdriver.ChromeOptions()
options.headless=False
driver = webdriver.Chrome("/Users/rishi/Downloads/chromedriver 3")
driver.maximize_window()
prefs = {"profile.default_content_setting_values.notifications" : 2}
options.add_experimental_option("prefs", prefs)
#open up website
driver.get(
"https://www.tripadvisor.com/Hotel_Review-g28970-d84078-Reviews-Hyatt_Regency_Washington_on_Capitol_Hill-Washington_DC_District_of_Columbia.html#/media/84078/?albumid=101&type=2&category=101")
image_url = []
end = False
while not(end):
#wait until element is found and then store all webelements into list
images = WebDriverWait(driver, 20).until(
EC.presence_of_all_elements_located(
(By.XPATH, '//*[@class="media-viewer-dt-root-GalleryImageWithOverlay__galleryImage--1Drp0"]')))
#iterate through visible images and acquire their url based on background image style
for index, image in enumerate(images):
image_url.append(images[index].value_of_css_property("background-image"))
#if you are at the end of the page then leave loop
# if(length == end_length):
# end = True
#move to next visible images in the array
driver.execute_script("arguments[0].scrollIntoView();", images[-1])
#wait one second
time.sleep(1)
if():
end = True
#clean the list to provide clear links
for i in range(len(image_url)):
start = image_url[i].find("url(\"") + len("url(\"")
end = image_url[i].find("\")")
print(image_url[i][start:end])
#print(image_url)
解决方案
推荐阅读
- node.js - Node PKG 无法从使用 EJS 的应用程序构建可执行文件
- amazon-web-services - Cloudformation SubnetList 获取 CidrBlock
- r - 有没有更有效的方法来对不规则的重复二进制触发列上的时间序列数据帧进行子集化?
- crystal-reports - 如果其他字段不重复,如何隐藏显示的唯一字段的重复项
- r - R中有没有办法将分类变量(字符)排序为排序的序数数据?
- javascript - 我可以分配一个变量来检查循环内的多个数组吗?
- vuepress - VuePress:如何修改降价内容
- javascript - 多个单选按钮作为过滤器
- jquery - 用于 jQuery 日期范围选择器的按钮,用于发送“从”和“到”日期
- html - 你能解释一下 position:absolute property 中可能出现的异常吗?