首页 > 解决方案 > Python - 沃尔玛的类别名称网页抓取

问题描述

我正在尝试从此 Walmart链接获取部门名称。你可以看到,首先里面左边有7个部门Departments(巧克力饼干,饼干,黄油饼干,...)。当我单击 时See All Departments,又添加了 9 个类别,所以现在这个数字是 16。我正在尝试自动获取所有 16 个部门。我写了这段代码;

from selenium import webdriver

n_links = []

driver = webdriver.Chrome(executable_path='D:/Desktop/demo/chromedriver.exe')
url = "https://www.walmart.com/browse/snacks-cookies-chips/cookies/976759_976787_1001391" 
driver.get(url)

search = driver.find_element_by_xpath("//*[@id='Departments']/div/div/ul").text
driver.find_element_by_xpath("//*[@id='Departments']/div/div/button/span").click()
search2 = driver.find_element_by_xpath("//*[@id='Departments']/div/div/div/div").text

sep = search.split('\n')
sep2 = search2.split('\n')

lngth = len(sep)
lngth2 = len(sep2)

for i in range (1,lngth):
    path = "//*[@id='Departments']/div/div/ul/li"+"["+ str(i) + "]/a"
    nav_links = driver.find_element_by_xpath(path).get_attribute('href')
    n_links.append(nav_links)
    
for i in range (1,lngth2):
    path = "//*[@id='Departments']/div/div/div/div/ul/li"+"["+ str(i) + "]/a"
    nav_links2 = driver.find_element_by_xpath(path).get_attribute('href')
    n_links.append(nav_links2)   
    
print(n_links)
print(len(n_links))

最后,当我运行代码时,我可以看到n_links数组中的链接。但问题是;有时它有 13 个链接,有时是 14 个。应该是 16,我还没有看到 16,只有 13 或 14。我尝试在行time.sleep(3)前添加search2,但没有工作。你能帮助我吗 ?

标签: pythonseleniumweb-scraping

解决方案


我认为你让这比现在更复杂。您是正确的,如果您单击按钮,您可能需要等待获得部门。

# This code will get all the departments shown
    departments = []
    departments = driver.find_elements_by_xpath("//li[contains(@class,'department')]") 
 
# Click on the show all departments button
    driver.find_element_by_xpath("//button[@data-automation-id='button']//span[contains(text(),'all Departments')]").click()

# Will get the departments shown
    departments = driver.find_elements_by_xpath("//li[contains(@class,'department')]")
    
# Iterate through the departments
for d in departments:
            print(d)

推荐阅读