首页 > 解决方案 > BeautifulSoup 输出到 List

问题描述

我正在从网站上抓取产品描述。产品有“旧价”和“新价”。所有产品都有这两个,除了一个(只有“新价格”)。我将这些值附加到一个空列表中。所以有“产品名称”、“产品旧价格”、“产品新价格”和“产品评论”四个列表。当我尝试制作 CSV 文件时,它给了我一个错误“数组必须都是相同的长度”。此错误的原因是:“产品旧价格”列表有 17 个条目,而其他三个列表有 18 个条目。如前所述,在一个产品中没有给出“产品旧价格”。下面是我的代码:

from bs4 import BeautifulSoup
import requests
import pandas as pd
url = "https://www.petplanet.co.uk/d7/dog_food"
r = requests.get(url)
soup = BeautifulSoup(r.content)
prod_name =[]
prod_old_price = []
prod_new_price = []
prod_reviews = []
item = soup.findAll("a", class_ = "thumbLink")
for name in item[0:15]:
    pro_name = name.get("title")
    prod_name.append(pro_name)
price = soup.findAll("span", class_ = "price right")
for prices in price:
    pro_new_price1 = prices.text
    pro_new_price = pro_new_price1.replace("آ"," ")
    prod_new_price.append(pro_new_price)
old_price = soup.findAll("span", class_ = "price-old")
for old_pri in old_price:
    pro_old_price = old_pri.text
    prod_old_price.append(pro_old_price)

reviews = soup.findAll("span", class_ = "text-prod-review-score")
for rev in reviews:
    pro_reviews = (len(rev))
    prod_reviews.append(pro_reviews)
old_price = soup.findAll("span", class_ = "price-old")
for old_pri in old_price:
    pro_old_price = old_pri.text
    prod_old_price.append(pro_old_price)

pet_products = pd.DataFrame({"Product Name": prod_name, "Product Old Price": prod_old_price, "Product New Price": prod_new_price, "Product Reviews     as # of Star": prod_reviews})
pet_products.to_csv("Pets Products.csv")

在没有给出“产品旧价格”的情况下,我想要“N/A”或“None”。或者有没有其他方法。谢谢

标签: beautifulsoup

解决方案


推荐

以其他方式循环产品并创建一个我认为更容易处理的产品,也可以使用list旧版本代替dictsfind_all()findAll()

怎么了?

原因old_price不在于page_source如果没有sale_price,您将找不到正确的位置来设置NA您搜索方式的值。

看看我的例子 - 如果没有old_price它会引发错误,但您可以使用它来创建NA值:

try:
    old_price = product.find("span", class_ = "price-old").get_text(strip=True)
except:
    old_price = 'NA'

例子

from bs4 import BeautifulSoup
import requests
import pandas as pd
url = "https://www.petplanet.co.uk/d7/dog_food"

r = requests.get(url)
soup = BeautifulSoup(r.content)

p_data = []

for product in soup.select('div#box-scroll-content li'):
    new_price = product.find("span", class_ = "price right").get_text().replace("آ"," ")
    try:
        old_price = product.find("span", class_ = "price-old").get_text(strip=True)
    except:
        old_price = 'NA'
    
    p_data.append({
        'new_price': new_price,
        'old_price': old_price
    })
    
pd.DataFrame(p_data)

输出

    new_price   old_price
0   £69.99  £76.99
1   £2.19   None
2   £6.99   £11.49
3   £6.99   £10.99
4   £0.89   £1.00

推荐阅读