首页 > 解决方案 > Web Scraping - printing values together - Python

问题描述

So I'm trying to scrape CS:GO skins, I'm trying to return: Skin name, Price and collection - in that order.

This is one of many ways I have tried it.

from bs4 import BeautifulSoup
import requests
import urllib3
urllib3.disable_warnings()

def webscrape():

    url = "https://csgostash.com/weapon/AWP"
    res = requests.get(url = url)
    soup = BeautifulSoup(res.text, "html.parser")

    titles = soup.find_all('div', class_="well result-box nomargin")
    prices = soup.find_all('div', class_="price")
    collection = soup.find_all('div', class_="collection")

    for title in titles:
        title = title.find('a')
        if title:
            title = title.text

    for price in prices:
        price = price.find('p')
        if price:
            price = price.text

    for cases in collection:
        cases = cases.find('p')
        if price:
            cases = cases.text
    print(title.text, price.text, collection.text)

webscrape()

This returns:

    print(title.text, price.text, collection.text)
AttributeError: 'NoneType' object has no attribute 'text'

I want it to return the three values in order. E.G. Containment Breach '\n' A$40.57 -A$271.90'\n' Shattered Web Case

and so on. Some of the skins have 2 Price sets, and I want both price sets to print out.

I have gotten it working more to show what I'm struggling with

from bs4 import BeautifulSoup
import requests
import urllib3
urllib3.disable_warnings()

def webscrape():

    url = "https://csgostash.com/weapon/AWP"
    res = requests.get(url = url)
    soup = BeautifulSoup(res.text, "html.parser")
    names = " "
    price = " "
    cases = " "
    titles = soup.find_all('div', class_="well result-box nomargin")
    prices = soup.find_all('div', class_="price")
    collection = soup.find_all('div', class_="collection")

    for name in titles:
        a_field = name.find('a')
        if a_field:
            names = a_field.text + '\n' + names

    for money in prices:
        p_field = money.find('p')
        if p_field:
            price = p_field.text + '\n' + price

    for case in collection:
        case_field = case.find('p')
        if case_field:
            cases = case_field.text + '\n' + cases
    print(names, price, cases)

webscrape()

This prints all the information I am looking for on the webpage but i want the information grouped together, like i want the prices and the collection for the skin to print under the name of the skin. Right now it prints all the name, then all the prices, then all the collections.

标签: pythonweb-scrapingprintingattributeerror

解决方案


titles = soup.find_all('div', class_="well result-box nomargin")

for title in titles:
    title = title.find('a')
    if title:
        title = title.text

您正在覆盖循环的每次迭代中的数据;我根本不清楚你认为你在做什么。我看到这个工作的唯一方法是如果你的最终迭代找到你想要的......在这种情况下作为你找到的最后一个值title退出。text

最后,您尝试获取that.text的属性。这几乎肯定会以某种不受欢迎的方式失败。

为了得到您看到的错误,最后一项titles确实包含“a”,并且具有text属性None;稍后,当您尝试提取 的属性时None,您会收到指示的错误。

相反,尝试

titles = soup.find_all('div', class_="well result-box nomargin")

for title in titles:
    a_field = title.find('a')
    if a_field:
        break

一旦找到所需的属性,这将使您退出搜索循环。


推荐阅读