首页 > 解决方案 > BeautifulSoup/Scr​​aper 问题,文本存在时没有文本,不在页面之间移动

问题描述

我正在尝试编写一个小抓取项目,只是为了更多地了解整个事物和 Python,但我遇到了一些问题,尽管我尽了最大努力,但我似乎无法解决。这样做的目的是查看我的愿望清单并生成一个 CSV 文件,然后如果有库存,我将与 Excel 中的主列表碰撞以了解状态更改。下面是我的代码:

import requests
from requests import get
from bs4 import BeautifulSoup
import pandas as pd
import numpy as np

from time import sleep
from random import randint

headers = {"Accept-Language": "en-US, en;q=0.5"}

titles = []
links = []
price = []
addtocart = []

pages = np.arange(1, 10, 1)

for page in pages:
    page = requests.get("https://www.instocktrades.com/wishlists/defc57d9758f4ba89683abbc7a3d93?pg=" + str(pages), headers=headers)
    
    soup = BeautifulSoup(page.text, "html.parser")
    wishlist_div = soup.find_all('div', class_='item thumbplus')

    sleep(randint(2,10))

for container in wishlist_div:

        #name
        name = container.find('div', class_='title').text.strip()
        titles.append(name)
        
        #link
        link = container.find('div', attrs={'class' : 'title'})
        for div in link:
            linking = container.find('a')['href']
            link = "https://www.instocktrades.com" + linking
        links.append(link)
        
        #price
        pricing = container.find('div', class_='price')
        price.append(pricing)
        
        #addtocart
        cart = container.find('button', class_='btn addtocart') if container.find('button', class_='btn addtocart') else 'Out Of Stock'
        addtocart.append(cart)

#building Pandas dataframe         
wishlist = pd.DataFrame({
'book': titles,
'link': links,
'price': price,
'cart': addtocart
})

wishlist.to_csv('wishlist.csv')
print(wishlist)

我遇到的问题如下:

  1. 它不会移动到网站的下一页,我以为我已经正确设置了它,但它似乎除了第一页之外不想做任何事情。
  2. 对于价格,如果我添加 .text,我会收到一个属性错误:'NoneType' 对象没有属性'text',但保留它会将所有 html 提取到 CSV 中,就像这样,我真的很想只需 27.99 美元:
<div class="price">
                                $27.99
                            </div>
  1. 对于购物车部分,查看“添加到购物车”按钮是否存在显然会告诉我它是否有货。如果我再次尝试添加 .text ,我会得到另一个没有文本的属性错误。如果我保持原样,它会将整个 html 代码再次放入按钮的整个 html 代码中,如下所示。我想为此实现的只是如果添加到购物车按钮存在返回值“In Stock”,如果它不存在,它将当前运行,它将写入“Out Of Stock”。
<button class="btn addtocart" data-cart-qty="0" data-code="MAR201512" data-id="66791" data-title="A Walk Through Hell Complete HC (C: 0-1-0)" data-wl="3851484" title="Add to Cart" type="button">
<img alt="Add to Cart" src="/images/cart.png"/> Add to Cart
                                </button>

绝对会感谢我在纠正这些问题方面能得到的任何帮助。谢谢!

标签: pythonpython-3.xweb-scrapingbeautifulsoup

解决方案


在您的价格块中使用它。只需搜索 class_='price'。问题是一些标题没有价格。

    pricing = container.find('div', class_='price')
    if pricing:
        price.append(pricing.text)
        print(pricing.text)
    else:
        print('no pricing')
        price.append(0)

部分输出:

https://www.instocktrades.com/products/mar201512/a-walk-through-hell-complete-hc-(c-0-1-0)

                                $27.99
                            
https://www.instocktrades.com/products/jul170097/abe-sapien-dark-terrible-hc-vol-01
no pricing
https://www.instocktrades.com/products/nov170018/abe-sapien-dark-terrible-hc-vol-02
no pricing
https://www.instocktrades.com/products/mar180092/abe-sapien-drowning-other-stories-hc
no pricing
https://www.instocktrades.com/products/may110255/absolute-all-star-superman-hc
no pricing
https://www.instocktrades.com/products/dec180616/absolute-batman-arkham-asylum-hc-30th-anniv-ed
no pricing
https://www.instocktrades.com/products/apr150293/absolute-batman-the-court-of-owls-hc
no pricing
https://www.instocktrades.com/products/aug180594/absolute-batman-the-black-mirror-hc
no pricing
https://www.instocktrades.com/products/feb201046/absolute-carnage-omnibus-hc
no pricing
https://www.instocktrades.com/products/may190468/absolute-death-hc-new-ed-(mr)
no pricing
https://www.instocktrades.com/products/aug190641/absolute-fourth-world-by-jack-kirby-hc-vol-01
no pricing
https://www.instocktrades.com/products/jan160353/absolute-preacher-hc-vol-01-(mr)
no pricing
https://www.instocktrades.com/products/nov160355/absolute-preacher-hc-vol-02-(mr)
no pricing
https://www.instocktrades.com/products/sep170442/absolute-preacher-hc-vol-03-(mr)

                                $87.00
                            
https://www.instocktrades.com/products/jul108195/absolute-sandman-vol-1-hc-(mr)

                                $57.99
                            

推荐阅读