首页 > 解决方案 > 使用 Selenium 进行 Web 抓取,使用 XPath 没有结果

问题描述

我正在尝试从https://openaq.org/#/location/Algiers?_k=nv8w8w获取数据,但它总是返回一个空值。

 def getCardDetails(country, url):
        
        local_df = pd.DataFrame(columns=['country','card_url','general','country_link','city', 'PM2.5','date','hour'])
        pm = None
        date = None
        hour = None
        general = None
        city = None
        country_link = None
    
        try:
            #wait = WebDriverWait(driver, 3)
            #wait.until(EC.presence_of_element_located((By.ID, 'location-fold-stats')))
            time.sleep(2)
    
            
            # Using Xpath we are getting the full text of the sibling that comes
            # after the text containing "PM2.5". We will split the full text to
            # generate variables for our Data Frame such as "pm", "date" & "hour".
            try:
                print("inn")
                pm_date = driver.find_element(By.XPATH, '//dt[text() = "PM2.5"]/following-sibling::dd[1]').text
                # Scraping pollution details from each location page
                # and splitting them to save in the relevant variables
                text = pm_date.split('µg/m³ at ')
                print("nn",pm_date)
                pm = float(text[0])
                full_date = text[1].split(' ')
                date = full_date[0]
                hour = full_date[1]

这是我第一次在网页抓取中使用 Selenium。我想知道 XPath 是如何工作的,这里有什么问题。

标签: python-3.xseleniumselenium-webdriverweb-scrapingselenium-chromedriver

解决方案


XPATH是正确的。要从动态元素中获取值,您需要诱导WebDriverWait() 并等待visibility_of_element_located()

print(WebDriverWait(driver,10).until(EC.visibility_of_element_located((By.XPATH,'//dt[text() = "PM2.5"]/following-sibling::dd[1]'))).text)

推荐阅读