python - Python 网页抓取 | 如何使用 try 和 except 处理丢失的元素,以便在未找到元素时打印为 Not available?
问题描述
from bs4 import BeautifulSoup as soup
from urllib.request import urlopen as uReq
import bs4
headers = {'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Brave Chrome/83.0.4103.116 Safari/537.36'}
my_url = 'https://www.jiomart.com/c/groceries/dairy-bakery/dairy/62'
uclient = uReq(my_url)
page_html = uclient.read()
uclient.close()
bs41 = soup(page_html, 'html.parser')
containers = bs41.find_all('div', {'col-md-3 p-0'})
#print(len(containers))
#print(soup.prettify(containers[0]))
for container in containers:
p_name = container.find_all('span', {'class' : 'clsgetname'})
productname = p_name[0].text
o_p = container.find_all('span' , id = 'final_price' )
offer_price = o_p[0].text
try:
ap = container.find_all('strike', id = 'price')
actual_price = ap[0].text
except:
print('not available')
print('Product name is', productname)
print('Product Mrp is', offer_price)
print('Product actual price', actual_price)
print()
在执行上述代码时,有一个产品没有实际价格,只有报价。但其他产品同时具有这两种价值。当我尝试通过尝试处理异常时,除了打印“不可用”之外,它不起作用。
相反,它在第一行打印为“不可用”,并且还显示实际价格为 35 卢比,而实际价格为空。
我应该如何处理这些事情,所以它可能对我有帮助。
解决方案
The issue is that even if it does not find the element, it still prints actual_price
which is probably in an outer scope.
You have 2 ways to approach this.
- The 1st is to only print if the element was found, for which you can do:
try:
ap = container.find_all('strike', id = 'price')
actual_price = ap[0].text
print('Product name is', productname)
print('Product Mrp is', offer_price)
print('Product actual price', actual_price)
except:
print('not available')
- The 2nd is to set
actual_price
to "not available", so it prints not available next to 'Product actual price'. To make this work you just need to addactual_price = 'not found'
in your except block, so your code would become:
try:
ap = container.find_all('strike', id = 'price')
actual_price = ap[0].text
except:
print('not available')
actual_price = 'not found'
推荐阅读
- matlab - 如何在 Matlab 中找到两个非线性图的交集值?
- php - 如何删除 PHPWord 图表中的标签?
- python - 当我使用熊猫将其更改为框架时,如何删除json文件显示的错误
- reactjs - 如何销毁和重新初始化组件以响应另一个组件中的按钮单击
- r - geom_errorbar 出现很多错误栏
- mysql - 如何创建包含“。”的表 特点?
- regex - 具有特定要求的正则表达式网址?
- c++ - C++ - 读取 .gz 文件,解压缩并写入另一个文件
- amazon-web-services - EC2 ecs 集群、ALB 和服务发现(在任务定义中使用 awsvpc 模式和桥接模式)
- javascript - 如何在新的 BrowserWindow(模态或 2e 窗口)电子应用程序中发送数据