首页 > 解决方案 > 无法使用 selenium 从网站中定位元素

问题描述

试图从业务目录中抓取数据,但我不断获取数据,但未找到

name = 
driver.find_elements_by_xpath('/html/body/div[3]/div/div/div[1]/div/div[1]/div/div[1]/h4')[0].text
# Results in: IndexError: list index out of range

所以我尝试使用WebDriverWait让代码等待数据加载但它没有找到元素,即使数据被加载到网站。

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import pandas as pd
from bs4 import BeautifulSoup
import requests
import time


url='https://www.dmcc.ae/business-search?directory=1&submissionGuid=2c8df029-a92e-4b5d-a014-7ef9948e664b'
driver = webdriver.Firefox()
driver.get(url)

wait=WebDriverWait(driver,50)

wait.until(EC.visibility_of_element_located((By.CLASS_NAME,'searched-list ng-scope')))
name = driver.find_elements_by_xpath('/html/body/div[3]/div/div/div[1]/div/div[1]/div/div[1]/h4')[0].text

print(name)

标签: python-3.xseleniumweb-scrapingwebdriver

解决方案


<iframe src="https://dmcc.secure.force.com/Business_directory_Page?initialWidth=987&amp;childId=pym-0&amp;parentTitle=List%20of%20Companies%20Registered%20in%20Dubai%2C%20DMCC%20Free%20Zone&amp;parentUrl=https%3A%2F%2Fwww.dmcc.ae%2Fbusiness-search%3Fdirectory%3D1%26submissionGuid%3D2c8df029-a92e-4b5d-a014-7ef9948e664b" width="100%" scrolling="no" marginheight="0" frameborder="0" height="3657px"></iframe>

切换到 iframe 并处理接受按钮。

driver.get('https://www.dmcc.ae/business-search?directory=1&submissionGuid=2c8df029-a92e-4b5d-a014-7ef9948e664b')
wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR, "#hs-eu-confirmation-button"))).click()
wait.until(EC.frame_to_be_available_and_switch_to_it((By.CSS_SELECTOR,'#pym-0 > iframe')))
wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR,'.searched-list.ng-scope')))
name = driver.find_elements_by_xpath('//*[@id="directory_list"]/div/div/div/div[1]/h4')[0]
print(name.text))

输出

1 BOXOFFICE DMCC

推荐阅读