python - Python Selenium Crawler 进入元素并获取详细信息
问题描述
我正在尝试从以下网站获取所有属性的详细信息,该网站将属性列为元素:
我在 Python 中使用 Selenium 来抓取元素的详细信息,但是一旦我转到该元素,我就无法单击它的链接将其打开到新页面并获取必要的信息。下面的代码:
from selenium.webdriver.common.keys import Keys
import webbrowser
import random
import time
import selenium.webdriver.support.ui as ui
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.support.select import Select
import csv
from csv import writer
from selenium.common.exceptions import ElementNotVisibleException, WebDriverException, NoSuchElementException
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
Link = 'https://www.altamirarealestate.com.cy/results/for-sale/flats/cyprus/35p009679979327046l33p17435142059772z9'
# MAIN
driver = webdriver.Chrome()
driver.maximize_window()
#Go to link
driver.get(Link)
#Accept cookies
time.sleep(2)
driver.find_element_by_xpath('//*[@id="onetrust-accept-btn-handler"]').click()
time.sleep(2)
#Load everything
while True:
try:
driver.find_element_by_xpath("//*[contains(@value,'View more')]").click()
time.sleep(3)
except Exception as no_more_properties:
print('all properties expanded: ', no_more_properties)
break
#Get properties
properties_list=driver.find_elements_by_xpath('//*[@class="minificha "]')
print (len(properties_list))#25
time.sleep(2)
#Get each property link
property_url=set()
properties_details=[]
main_window_handle = driver.current_window_handle
for i in range(0,len(properties_list)):
driver.switch_to_window(main_window_handle)
property = properties_list[i]
property_link = property.find_element_by_xpath('//a[@href="'+url+'"]')
property_link.click()
time.sleep(2)
#Switch to property window
window_after = driver.window_handles[1]
driver.switch_to.window(window_after)
#Get number of properties
number_of_flats=driver.find_elements_by_xpath('//[@class="lineainmu "]')
print(len(number_of_flats))
time.sleep(2)
currentWindow = driver.current_window_handle
for j in range(0,len(number_of_flats)):
driver.switch_to_window(currentWindow)
flat= number_of_flats[j]
flat.click()
time.sleep(2)
#Switch to flat window
window_after = driver.window_handles[1]
driver.switch_to.window(window_after)
解决方案
当我们单击第一页上的链接时,它将打开一个新选项卡。在 selenium 中,在这些类型的情况下,我们应该将焦点切换到新窗口,然后我们可以在新打开的页面上与 web 元素进行交互。
任务完成后,关闭选项卡然后切换回原始内容很重要。
如果我们没有在循环中再次定义 Web 元素,这可能会导致元素引用过时。
代码 :
driver = webdriver.Chrome(driver_path)
driver.maximize_window()
driver.implicitly_wait(30)
wait = WebDriverWait(driver, 30)
driver.get("https://www.altamirarealestate.com.cy/results/for-sale/flats/cyprus/35p009679979327046l33p17435142059772z9")
try:
wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR, "button#onetrust-accept-btn-handler"))).click()
except:
pass
size = driver.find_elements(By.XPATH, "//div[@class='slick-list draggable']")
j = 1
org_windows_handle = driver.current_window_handle
for i in range(len(size)):
ele = driver.find_element(By.XPATH, f"(//div[@class='slick-list draggable'])[{j}]")
driver.execute_script("arguments[0].scrollIntoView(true);", ele)
ele.click()
all_handles = driver.window_handles
driver.switch_to.window(all_handles[1])
try:
name = wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR, "p#tituloFiltroTipo"))).text
print(name)
except:
pass
try:
price = wait.until(EC.visibility_of_element_located((By.ID, "soloPrecio"))).text
print(price)
except:
pass
driver.close()
driver.switch_to.window(org_windows_handle)
j = j + 1
进口:
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
输出 :
Flats - Egkomi, Nicosia
310,000
Flat - Strovolos, Nicosia
115,000
Flat - Agios Dometios, Nicosia
185,000
Flats - Aglantzia, Nicosia
765,000
Flat - Kaimakli, Nicosia
170,000
Flat - Kaimakli, Nicosia
280,000
Flat - Kaimakli, Nicosia
130,000
Flat - Germasogia, Limassol
410,000
Flat - Germasogeia, Limassol
285,000
Flat - Petrou & Pavlou, Limassol
230,000
不建议将隐式与显式混合。但在少数情况下,我们正在使用find_element
和显式等待,不会造成任何伤害。请注释隐式等待行,然后运行代码。如果失败,请取消注释,然后重试。
推荐阅读
- javascript - 如何在没有 React 的情况下在页面中插入组件?
- firebase - 如何从颤振中的for循环返回流列表
- python - 是否可以在 python3 的 argparse 中添加“-”标志?
- django - 永久激活 cpanel 主机上的 celery worker 和 celery beat
- istio - 在 ISTIO 中调用网格内的服务仅使用与服务名称关联的端口号
- python - Dockerfile,如何减小更改层的大小?
- ruby-on-rails - 首次使用 Ruby on Rails + Devise 登录后显示模式(仅 1 次)
- python-3.x - pyqt,如何使用带有 lambda 的 sectionClicked
- android - TextView 没有缩小
- javascript - 对对象值求和