首页 > 解决方案 > 为什么抓取网页的最后一项后出现错误?

问题描述

我制作了一个从 newEgg 检索产品名称和价格的程序,但是在处理网页上的最后一个产品后,我收到一条错误消息“属性错误:'Nonetype' 对象没有属性'强'。我很确定它是一个空指针错误,因为循环正在运行所有网页元素,但是我尝试迭代到 itemContainers-1,以及在 itemcontainers-1 的循环中设置断点,但它仍然不起作用。另外,我应该把 Client.close() 放在最后对吗?

import bs4
#uReq is our arbitrary shorthand for urllib.request
import urllib
from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup

#The URL we plan to use
my_url = 'https://www.newegg.com/'

#uReq(my_url) opens up web client
Client = uReq(my_url)
#uClient.read dumps everything out of the url
html_page = Client.read()
Client.close()


page_soup = soup(html_page, "html.parser")
itemContainers= page_soup.findAll("div", {"class":"item-container"})

for i in range(0,len(itemContainers)):
    if i is len(itemContainers)-1:
        {
            breakpoint
        }
    #itemTitles is a list of all of the titles found on the web page
    itemTitles = page_soup.findAll("a", {"class": "item-title"})

    divWithPriceInfo = itemContainers[i].find("ul", "price")
    left_Dec = divWithPriceInfo.strong.text
    right_Dec = divWithPriceInfo.sup.text
    stringStrong = str(left_Dec)
    stringSup = str(right_Dec)
    print(itemTitles[i].text)
    print(stringStrong + stringSup)

标签: pythonhtmlweb-scraping

解决方案


推荐阅读