首页 > 解决方案 > 我正在尝试从此链接中刮取产品的颜色和型号

问题描述

我在这里遇到错误,我无法抓取数据。

基本网址 = https://www.mobilephonesdirect.co.uk/brands/apple?monthly_cost=40

产品网址 = https://www.mobilephonesdirect.co.uk/handset/apple/iphone-12

我想从所有产品链接中获取内存详细信息。

在此处输入图像描述

from selenium import webdriver
from bs4 import BeautifulSoup
import xlwt
import time

driver = webdriver.Chrome()
driver.get('https://www.mobilephonesdirect.co.uk/brands')
time.sleep(5)
cookies = driver.find_element_by_xpath("//button[contains(text(),'Accept')]")
time.sleep(5)
cookies.click()
time.sleep(5)
print("cookies accepted")
time.sleep(5)
driver.maximize_window()
print("window maximized")
click = driver.find_element_by_css_selector('.u-grid--3--bp-medium:nth-child(1) .u-ai--center').click()
time.sleep(5)
print("clicked apple phones")
time.sleep(5)
#creating soup obj for the products
content = driver.page_source
soup = BeautifulSoup(content,'html.parser')
#print(soup.prettify())
#creating obj for apple product link
print(driver.current_url)
links = soup.find_all('div',{'class':'o-flex-container u-px--xsmall u-pt--xsmall'})
list_links = []
for link in links:
    anchor = link.find('a')
    url = 'https://www.mobilephonesdirect.co.uk' + anchor["href"]
    list_links.append(url)
for urls in list_links:
    driver.get(urls)
    #print(soup1.prettify())
    print(driver.current_url)
    source = driver.page_source
    soup1 = BeautifulSoup(source,'html.parser')

    product_memory = soup1.find('div',{'class':'u-fz--title-small u-fw--400'})
    print(product_memory.text)
    

标签: pythonseleniumbeautifulsoup

解决方案


问题是它在一个地方运行得有点太快了。

在这一行之后:

driver.get(urls)

把这个

time.sleep(5)

然后它将正常工作。

我对这些库并不太熟悉,但我认为发生的事情是该driver.get(urls)行告诉 webdriver 加载该页面,但下一行source = driver.page_source立即运行,因此该页面尚未加载。所以还没有源,因为页面还没有完成加载。暂停该暂停为页面加载提供了足够的时间。


推荐阅读