首页 > 解决方案 > Python:find_all 仅适用于某些标签

问题描述

bs4 中的 .findall 函数仅适用于某些 HTML 标记。我正在尝试抓取一个网站。

from bs4 import BeautifulSoup
import requests

url = 'https://bitskins.com/'
page_response = requests.get(url, timeout=5)
page_content = BeautifulSoup(page_response.content, 'html.parser')

# Gather the two lists
skin_list = page_content.find_all('div', attrs={'class': 'panel-heading item-title'})
wear_box = page_content.find_all('div', attrs={'class': 'text-muted text-center'})

当我打印 skin_list 时,它可以成功运行,但是当我尝试打印磨损列表时,它会打印一个空列表。

我尝试了另一件事:

wear_box = page_content.html.search("Wear: {float}")

这带来了一个错误,指出“NoneType”对象不可调用。

我正在使用 Sublime Text 3。

标签: pythonweb-scraping

解决方案


from bs4 import BeautifulSoup
import requests

url = 'https://bitskins.com/'
page_response = requests.get(url, timeout=5)
page_content = BeautifulSoup(page_response.content, 'html.parser')

skin_list = page_content.findAll('div', class_ = 'panel item-featured panel-default')

for skin in skin_list:
    name = skin.find("div", class_ = "panel-heading item-title")
    price = skin.find("span", class_ = "item-price hidden")
    discount = skin.find("span", class_ = "badge badge-info")
    wear = skin.find("span", class_ = "hidden unwrappable-float-pointer")
    
    print("name:", name.text)
    print("Price", price.text)
    print("Discount:", discount.text)

    # Choose which one you want
    for w in wear.text.split(","):
        print("Wear:", w)

您试图找到不正确的课程。我添加了一些其他数据,您可以抓取这些数据作为示例。Wear 保存了我输出的一些值。


推荐阅读