首页 > 解决方案 > Python - selenium - 无法获取 xpath


我正在尝试查找以下 HTML 结构的 xpath。

<div class="col-xs-6 pg-desc-section">
        <p class="boldText">Jewelry</p>
        <p data-hostname="nikki stanzione" data-showname="gifts from dallas prince jewelry" data-category="jewelry" data-airtime="11/12/2020 12:00:00 AM">

                </p><div class="hidden-xs " data-showscheduleid="23839680">Gifts From Dallas Prince Jewelry</div>
                    <a class="mobile-showlink visible-xs" data-showscheduleid="23839680" onclick="pgmoreinfo(this); $(this).hide(); $(this).next().show();"> Gifts From Dallas Prince Jewelry</a>
                    <a class="ab-mobile-showlink" style="display:none;" data-showlink="abTest" data-showscheduleid="23839680" onclick="pgmoreinfo(this);"> Gifts From Dallas Prince Jewelry</a>

我想要的是第三行中属性 data-airtime 的值。我尝试了下面的代码,但它显示错误的语法。(请注意,数据通话时间是自定义的)

bc=driver.find_elements_by_xpath("//div[@class='col-xs-6 pg-desc-section']/p[@data-category='Jewelry']").data-airtime

请在定义 xpath 时帮助找出我做错了什么

标签: pythonseleniumxpathweb-scraping





def get_number_of_displayed_rows(driver : ChromeDriver):
    xpath = "//div[contains(@class, 'program-guide-table')]//div[@class='row content']//div[contains(@class, 'custom-col-2-detail')]"
    return driver.find_elements(By.XPATH, xpath).__len__()

使用这种方法,我能够看到显示的 24 行。从那里,我滚动到每个元素,然后能够使用xpath.

我只为您获取了第一列的数据;但是,使用此代码,您应该能够获得第 2 列和第 3 列。

主程序 - 供参考

from selenium import webdriver
from selenium.webdriver.chrome.webdriver import WebDriver as ChromeDriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait as DriverWait
from selenium.webdriver.support import expected_conditions as DriverConditions
from selenium.common.exceptions import WebDriverException

def get_chrome_driver():
    """This sets up our Chrome Driver and returns it as an object"""
    path_to_chrome = "F:\Selenium_Drivers\Windows_Chrome87_Driver\chromedriver.exe"
    chrome_options = webdriver.ChromeOptions() 
    # Browser is displayed in a custom window size
    return webdriver.Chrome(executable_path = path_to_chrome,
                            options = chrome_options)

def wait_displayed(driver : ChromeDriver, xpath: str, int = 5):
         DriverWait(driver, int).until(
            DriverConditions.presence_of_element_located(locator = (By.XPATH, xpath))
        raise WebDriverException(f'Timeout: Failed to find {xpath}')

def scroll_to_element(driver : ChromeDriver, xpath: str, int = 5):
         webElement = DriverWait(driver, int).until(
            DriverConditions.presence_of_element_located(locator = (By.XPATH, xpath))
         driver.execute_script("return arguments[0].scrollIntoView();", webElement)
        raise WebDriverException(f'Timeout: Failed to find {xpath}\nResult: Could not scroll to element')
def get_number_of_displayed_rows(driver : ChromeDriver):
    xpath = "//div[contains(@class, 'program-guide-table')]//div[@class='row content']//div[contains(@class, 'custom-col-2-detail')]"
    return driver.find_elements(By.XPATH, xpath).__len__()

# Gets our chrome driver and opens our site
chrome_driver = get_chrome_driver()
wait_displayed(chrome_driver, "//input[@id='txtSearchString']", 30)
wait_displayed(chrome_driver, "//div[contains(@class, 'program-guide-table')]", 30)
wait_displayed(chrome_driver, "//div[contains(@class, 'program-guide-table')]//div[@class='row content']//div[contains(@class, 'custom-col-2-detail')][1]", 30)

numberOfRecords = get_number_of_displayed_rows(chrome_driver)

# Loop through each record and scrape the data
for rowNumber in range(numberOfRecords):
    # Record Row Base Xpath
    recordXpath = "//div[contains(@class, 'program-guide-table')]//div[@class='row content']//div[contains(@class, 'custom-col-2-detail')]"
    print(f'Scrolling to Record #{rowNumber + 1}')
    scroll_to_element(chrome_driver, xpath = f'{recordXpath}[{rowNumber + 1}]')
    recordXpath = "//div[contains(@class, 'program-guide-table')]//div[@class='row content']//div[contains(@class, 'desktop-schedule-row')]"
    rowXpath = f'{recordXpath}[{rowNumber + 1}]'
    print(f'Record {rowNumber + 1} data')
    # First column's element and details
    recordElement = chrome_driver.find_element(By.XPATH, "{0}//div[contains(@class, 'pg-show')][1]//p[@data-showname]".format(rowXpath))
    print("Host Name: {0}".format(recordElement.get_attribute('data-hostname')))
    print("Show Name: {0}".format(recordElement.get_attribute('data-showname')))
    print("Category: {0}".format(recordElement.get_attribute('data-category')))
    print("Air Time: {0}".format(recordElement.get_attribute('data-airtime')))



Scrolling to Record #1
Record 1 data
Host Name: kendy kloepfer
Show Name: gifts from invicta watches
Category: watches
Air Time: 11/26/2020 12:00:00 AM

Scrolling to Record #2
Record 2 data
Host Name: lynne schacher
Show Name: holiday diamond day kick-off
Category: jewelry
Air Time: 11/26/2020 1:00:00 AM

Scrolling to Record #3
Record 3 data
Host Name: daniel green
Show Name: dr. terry dubrow: safe living
Category: beauty
Air Time: 11/26/2020 2:00:00 AM

Scrolling to Record #4
Record 4 data
Host Name: daniel green
Show Name: gifts from invicta watches
Category: watches
Air Time: 11/26/2020 3:00:00 AM

Scrolling to Record #5
Record 5 data
Host Name: melissa miner
Show Name: gifts of joy
Category: electronics
Air Time: 11/26/2020 4:00:00 AM

Scrolling to Record #6
Record 6 data
Host Name: heather hall
Show Name: dr. terry dubrow: safe living
Category: beauty
Air Time: 11/26/2020 5:00:00 AM

Scrolling to Record #7
Record 7 data
Host Name: kendy kloepfer
Show Name: gifts from invicta watches
Category: watches
Air Time: 11/26/2020 6:00:00 AM

Scrolling to Record #8
Record 8 data
Host Name: nikki stanzione
Show Name: gifts for the family
Category: electronics
Air Time: 11/26/2020 7:00:00 AM

Scrolling to Record #9
Record 9 data
Host Name: fatima cocci
Show Name: gifts from pamela mccoy collection
Category: jewelry
Air Time: 11/26/2020 8:00:00 AM

Scrolling to Record #10
Record 10 data
Host Name: kathy norton
Show Name: fashion talk with fatima & kathy
Category: fashion
Air Time: 11/26/2020 9:00:00 AM

Scrolling to Record #11
Record 11 data
Host Name: kathy norton
Show Name: fashion talk with fatima & kathy
Category: fashion
Air Time: 11/26/2020 10:00:00 AM

Scrolling to Record #12
Record 12 data
Host Name: kathy norton
Show Name: gifts of designer fragrances
Category: beauty
Air Time: 11/26/2020 11:00:00 AM

Scrolling to Record #13
Record 13 data
Host Name: lynne schacher
Show Name: top accessory gifts of the season
Category: fashion
Air Time: 11/26/2020 12:00:00 PM

Scrolling to Record #14
Record 14 data
Host Name: lynne schacher
Show Name: fashion doorbusters
Category: watches
Air Time: 11/26/2020 1:00:00 PM

Scrolling to Record #15
Record 15 data
Host Name: nikki stanzione
Show Name: fashion doorbusters
Category: fashion
Air Time: 11/26/2020 2:00:00 PM

Scrolling to Record #16
Record 16 data
Host Name: nikki stanzione
Show Name: top accessory gifts of the season
Category: fashion
Air Time: 11/26/2020 3:00:00 PM

Scrolling to Record #17
Record 17 data
Host Name: jess manuel
Show Name: mayamar jewelry: live from st. barts
Category: jewelry
Air Time: 11/26/2020 4:00:00 PM

Scrolling to Record #18
Record 18 data
Host Name: jess manuel
Show Name: dr. terry dubrow: safe living
Category: beauty
Air Time: 11/26/2020 5:00:00 PM

Scrolling to Record #19
Record 19 data
Host Name: kendy kloepfer
Show Name: black friday starts now
Category: electronics
Air Time: 11/26/2020 6:00:00 PM

Scrolling to Record #20
Record 20 data
Host Name: kendy kloepfer
Show Name: black friday starts now
Category: electronics
Air Time: 11/26/2020 7:00:00 PM

Scrolling to Record #21
Record 21 data
Host Name: jess manuel
Show Name: dr. terry dubrow: safe living
Category: home
Air Time: 11/26/2020 8:00:00 PM

Scrolling to Record #22
Record 22 data
Host Name: kendy kloepfer
Show Name: gifts from waterford crystal
Category: home
Air Time: 11/26/2020 9:00:00 PM

Scrolling to Record #23
Record 23 data
Host Name: kendy kloepfer
Show Name: gifts from waterford crystal
Category: home
Air Time: 11/26/2020 10:00:00 PM

Scrolling to Record #24
Record 24 data
Host Name: fatima cocci
Show Name: gifts from stefano oro gold jewelry
Category: jewelry
Air Time: 11/26/2020 11:00:00 PM
