python - 是否有一种有效的方法可以使用 python(并避免使用开发工具)为 selenium 挖掘 html 元素。如果这可以用 BeautifulSoup 来完成,怎么做?
问题描述
我正在编写一个 selenium webdriver 脚本来自动化更新事件注册门户的过程。
用户界面的图片链接如下。
我能够成功登录到门户网站。通过使用从 chrome 开发工具复制的 XPATH 元素。我还能够成功地在屏幕左侧的各个文件夹之间自动切换(2018、2017、...、加拿大、美国、...、温哥华、基洛纳、...)。请记住,我能够做到这一点是通过手动处理每个可单击文件夹链接的单个 XPATH 元素,如下面的脚本所示。
import time
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.common.exceptions import TimeoutException
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from bs4 import BeautifulSoup
# Create a new instance of the Chrome driver
driver = webdriver.Chrome("C:/Users/Computer/Downloads/chromedriver")
# Go to the regOnline homepage
driver.get("######LoginPage#####")
driver.get("#####LinkedUIPage#####")
# the page is ajaxy so the title is originally this:
print ("print " + driver.title)
# find the username and password element
userElement = driver.find_element_by_id("ctl00_cphMaster_txtLogin")
passElement = driver.find_element_by_id("ctl00_cphMaster_txtPassword")
# find the login button element
logInElement = driver.find_element(By.LINK_TEXT,"Sign In")
# type in account information
userElement.send_keys("#####")
passElement.send_keys("#####")
# log into the webpage (hit the enter button)
logInElement.send_keys(Keys.ENTER)
# xpath elements
calgaryXPATH = '//*[@id="ctl00_ctl00_cphDialog_cpMgrMain_trUserNodes"]/ul/li[1]/ul/li[11]/ul/li[1]/ul/li[1]/div/span[2]'
edmontonXPATH = '//*[@id="ctl00_ctl00_cphDialog_cpMgrMain_trUserNodes"]/ul/li[1]/ul/li[11]/ul/li[1]/ul/li[2]/div/span[2]'
fortNelsonXPATH = '//*[@id="ctl00_ctl00_cphDialog_cpMgrMain_trUserNodes"]/ul/li[1]/ul/li[11]/ul/li[1]/ul/li[3]/div/span[2]'
fortStJohnXPATH = '//*[@id="ctl00_ctl00_cphDialog_cpMgrMain_trUserNodes"]/ul/li[1]/ul/li[11]/ul/li[1]/ul/li[4]/div/span[2]'
halifaxXPATH = '//*[@id="ctl00_ctl00_cphDialog_cpMgrMain_trUserNodes"]/ul/li[1]/ul/li[11]/ul/li[1]/ul/li[5]/div/span[2]'
kamloopsXPATH = '//*[@id="ctl00_ctl00_cphDialog_cpMgrMain_trUserNodes"]/ul/li[1]/ul/li[11]/ul/li[1]/ul/li[6]/div/span[2]'
kelownaXPATH = '//*[@id="ctl00_ctl00_cphDialog_cpMgrMain_trUserNodes"]/ul/li[1]/ul/li[11]/ul/li[1]/ul/li[7]/div/span[2]'
ottawaXPATH = '//*[@id="ctl00_ctl00_cphDialog_cpMgrMain_trUserNodes"]/ul/li[1]/ul/li[11]/ul/li[1]/ul/li[10]/div/span[2]'
princeGeorgeXPATH = '//*[@id="ctl00_ctl00_cphDialog_cpMgrMain_trUserNodes"]/ul/li[1]/ul/li[11]/ul/li[1]/ul/li[11]/div/span[2]'
saskatoonXPATH = '//*[@id="ctl00_ctl00_cphDialog_cpMgrMain_trUserNodes"]/ul/li[1]/ul/li[11]/ul/li[1]/ul/li[12]/div/span[2]'
thunderBayXPATH = '//*[@id="ctl00_ctl00_cphDialog_cpMgrMain_trUserNodes"]/ul/li[1]/ul/li[11]/ul/li[1]/ul/li[13]/div/span[2]'
torontoXPATH = '//*[@id="ctl00_ctl00_cphDialog_cpMgrMain_trUserNodes"]/ul/li[1]/ul/li[11]/ul/li[1]/ul/li[14]/div/span[2]'
vancouverXPATH = '//*[@id="ctl00_ctl00_cphDialog_cpMgrMain_trUserNodes"]/ul/li[1]/ul/li[11]/ul/li[1]/ul/li[15]/div/span[2]'
whitehorseXPATH = '//*[@id="ctl00_ctl00_cphDialog_cpMgrMain_trUserNodes"]/ul/li[1]/ul/li[11]/ul/li[1]/ul/li[16]/div/span[2]'
williamsLakeXPATH = '//*[@id="ctl00_ctl00_cphDialog_cpMgrMain_trUserNodes"]/ul/li[1]/ul/li[11]/ul/li[1]/ul/li[17]/div/span[2]'
page2018XPATH = '//*[@id="ctl00_ctl00_cphDialog_cpMgrMain_trUserNodes"]/ul/li[1]/ul/li[11]/div/img'
canada2018XPATH = '//*[@id="ctl00_ctl00_cphDialog_cpMgrMain_trUserNodes"]/ul/li[1]/ul/li[11]/ul/li[1]/div/img'
# find city element
calgary = driver.find_element(By.XPATH, calgaryXPATH)
edmonton = driver.find_element(By.XPATH, edmontonXPATH)
fortNelson = driver.find_element(By.XPATH, fortNelsonXPATH)
fortStJohn = driver.find_element(By.XPATH, fortStJohnXPATH)
halifax = driver.find_element(By.XPATH, halifaxXPATH)
kamloops = driver.find_element(By.XPATH, kamloopsXPATH)
kelowna = driver.find_element(By.XPATH, kelownaXPATH)
ottawa = driver.find_element(By.XPATH, ottawaXPATH)
princeGeorge = driver.find_element(By.XPATH, princeGeorgeXPATH)
saskatoon = driver.find_element(By.XPATH, saskatoonXPATH)
thunderBay = driver.find_element(By.XPATH, thunderBayXPATH)
toronto = driver.find_element(By.XPATH, torontoXPATH)
whitehorse = driver.find_element(By.XPATH, whitehorseXPATH)
vancouver = driver.find_element(By.XPATH, vancouverXPATH)
williamsLake = driver.find_element(By.XPATH, williamsLakeXPATH)
page2018 = driver.find_element(By.XPATH, page2018XPATH)
canada2018 = driver.find_element(By.XPATH, canada2018XPATH)
from selenium.webdriver.common.action_chains import ActionChains
actions = ActionChains(driver)
def goToPage(xpath, sec):
actions.move_to_element(xpath)
actions.click(xpath)
actions.perform()
time.sleep(sec)
# testing individual page access
goToPage(page2018,3)
goToPage(canada2018,3)
# save page html
#html = driver.page_source
#soup = BeautifulSoup(html)
参考该图像,我需要访问 UI 中右侧部分的所有单个事件链接,即事件日志。从每个链接复制每个 XPATH 元素将是费力且不必要的。此外,这些将不断更新,我需要一种方法来访问单个元素,而无需从浏览器的开发工具中手动复制和粘贴。
问题: - 是否有一种有效的方法可以使用 python(并避免使用开发工具)为 selenium 挖掘 html 元素。
-- 是否有可能通过解析 HTML 的 DOM 用漂亮的汤来做到这一点 -- 如果提议的方法扩展到 UI 中的任何元素,那就更好了。
注意- 如果可能的话,我不知道如何在美丽的汤中做到这一点。
问候,J
解决方案
推荐阅读
- owin - 将 Kentor.AuthServices 升级到 Sustainsys.Saml2 后 /saml2/idp/ 出现 404 错误
- php - 如何覆盖 prestashop 中的 tpl 模块?
- docker - 如何用 docker 文件覆盖 nginx 默认配置?
- c++ - 线程的示例程序计数器
- ios - 识别不是 NSContentSizeLayoutConstraint 或类似的高度/宽度约束
- mysql - 如何定义具有相同名称的 LEFT JOIN 变量
- angular - 是否可以在 Angular 的项目文件夹中保存文件?
- javascript - Javascript 代码不适用于除第一个之外的所有 css 类
- c# - 如何使用图像附件将邮件发送到多个 id?
- r - 为什么我的 DBI 连接器使用谓词“0 = 1”进行虚拟查询?