首页 > 解决方案 > 通过使用 Beautiful Soup、Selenium 和 Pandas 提取 div 类中的值来抓取价格

问题描述

我试图获得给定尺寸的产品价格,因为它们每天都在波动。我能够让我的代码在使用“类”的网站上工作,但我无法让它与 div 和 span 类一起工作。

链接:https ://www.flightclub.com/supreme-x-dunk-sb-low-varsity-red-varsity-red-white-black-152127?size=9.5 价格:550 美元(截至本帖)

from selenium import webdriver
from bs4 import BeautifulSoup                                                              
import pandas as pd                                                                        

driver = webdriver.Chrome("/Users/donlento7/chromedriver")                                 

products=[] #List to store name of the product                                             
prices=[] #List to store price of the product                                              
driver.get('https://www.flightclub.com/supreme-x-dunk-sb-low-varsity-red-varsity-red-white-black-152127?size=9.5')

content = driver.page_source                                                               
soup = BeautifulSoup(content, "lxml")                                                      
for a in soup.findAll('div',href=True, attrs={'class':'product-essential row-fluid product-type-configurable'}):
    name=a.find('div', attrs={'class':'mb-padding'})
    price=a.find('span', attrs={'class':'price'})                                      
    products.append(name.text)
    prices.append(price.text)

df = pd.DataFrame({'Product Name':products,'Price':prices})                                
#df.to_csv('products.csv', index=False, encoding='utf-8')                                  
print(df)

输出:

Empty DataFrame
Columns: [Product Name, Price]
Index: []

标签: pythonpandasseleniumweb-scrapingbeautifulsoup

解决方案


由于该行,您将获得 EMPTY 列表。

for a in soup.findAll('div',href=True, attrs={'class':'product-essential row-fluid product-type-configurable'}):

标签中没有href属性。div

将此更改为:

for a in soup.findAll('div',attrs={'class':'product-essential row-fluid product-type-configurable'}):

推荐阅读