首页 > 解决方案 > Python 网页抓取 | 如何使用 try 和 except 处理丢失的元素,以便在未找到元素时打印为 Not available?

问题描述

from bs4 import BeautifulSoup as soup

from urllib.request import urlopen as uReq

import bs4

headers = {'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Brave Chrome/83.0.4103.116 Safari/537.36'}

my_url = 'https://www.jiomart.com/c/groceries/dairy-bakery/dairy/62'



uclient = uReq(my_url)

page_html = uclient.read()

uclient.close()


bs41 = soup(page_html, 'html.parser')


containers = bs41.find_all('div', {'col-md-3 p-0'})
#print(len(containers))


#print(soup.prettify(containers[0]))



for container in containers:
    p_name = container.find_all('span', {'class' : 'clsgetname'})
    productname = p_name[0].text

    o_p = container.find_all('span' , id = 'final_price' )
    offer_price = o_p[0].text


    try:
        ap = container.find_all('strike', id = 'price')
        actual_price = ap[0].text

    except:
        print('not available')

    

    print('Product name is', productname)
    print('Product Mrp is', offer_price)
    print('Product actual price', actual_price)
    
    
    print()


   

在执行上述代码时,有一个产品没有实际价格,只有报价。但其他产品同时具有这两种价值。当我尝试通过尝试处理异常时,除了打印“不可用”之外,它不起作用。

相反,它在第一行打印为“不可用”,并且还显示实际价格为 35 卢比,而实际价格为空。

我应该如何处理这些事情,所以它可能对我有帮助。

标签: pythonexceptionweb-scrapingbeautifulsouptry-except

解决方案


The issue is that even if it does not find the element, it still prints actual_price which is probably in an outer scope.

You have 2 ways to approach this.

  • The 1st is to only print if the element was found, for which you can do:
    try:
        ap = container.find_all('strike', id = 'price')
        actual_price = ap[0].text
        print('Product name is', productname)
        print('Product Mrp is', offer_price)
        print('Product actual price', actual_price)

    except:
        print('not available')
  • The 2nd is to set actual_price to "not available", so it prints not available next to 'Product actual price'. To make this work you just need to add actual_price = 'not found' in your except block, so your code would become:
    try:
        ap = container.find_all('strike', id = 'price')
        actual_price = ap[0].text

    except:
        print('not available')
        actual_price = 'not found'

推荐阅读