首页 > 解决方案 > 代码第二次循环时bs4出错

问题描述

from bs4 import BeautifulSoup
import requests



murl = ['https://www.amazon.in/dp/B07XY541GH/','https://www.amazon.in/dp/B085J17VVP/']

def track(url):
    req = requests.get(url)
    soup = BeautifulSoup(req.content, 'html5lib')

    pr = soup.find('span', id='priceblock_ourprice').getText()
    con_pr = pr[1:]
    converted_price = con_pr.strip()
    newprice = ''
    for con in converted_price:
        if con != ',':
            newprice = newprice + con
    newprice = float(newprice)
    return newprice

def main():
    for url in murl:
        price = track(url)
        print(price)

main()

当代码第二次循环时,我似乎总是得到这个。即使我将 url 存储在两个不同的变量中并一个一个调用该函数,我仍然会得到相同的错误。

2990.0
Traceback (most recent call last):
  File "test.py", line 28, in <module>
    main()
  File "test.py", line 25, in main
    price = track(url)
  File "test.py", line 13, in track
    pr = soup.find('span', id='priceblock_ourprice').getText()
AttributeError: 'NoneType' object has no attribute 'getText'

有什么解决办法吗?

标签: pythonbeautifulsoup

解决方案


这是解决方案,试试看:

import requests
from bs4 import BeautifulSoup

murl = ['https://www.amazon.in/dp/B07XY541GH/', 'https://www.amazon.in/dp/B085J17VVP/']


def track(url):
    req = requests.get(url, headers={
        "user-agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.87 Safari/537.36"})
    soup = BeautifulSoup(req.content, 'html5lib')

    pr = soup.find('span', id=lambda x: x and x.startswith('priceblock_')).text
    con_pr = pr[1:]
    converted_price = con_pr.strip()
    newprice = ''
    for con in converted_price:
        if con != ',':
            newprice = newprice + con
    newprice = float(newprice)
    return newprice


def main():
    for url in murl:
        price = track(url)
        print(price)


main()
  1. 使用错过的 user-aget。大多数情况下它应该用于解析
  2. 元素标签对象没有 getText() 方法
  3. 两个站点的 ID 不同

推荐阅读