首页 > 解决方案 > 用于网页抓取的 Pandas Dataframe 语法无效

问题描述

我是 python 新手,我正在尝试实现一个网络抓取项目。我正在学习一个教程,并陷入了将数据传递到 csv 表的部分。我已经尝试移动一些括号和其他结构,但似乎没有任何帮助。请参阅附加的代码感谢您的帮助,我已经被困了几个小时。

An obs:命令“Dataframe”没有改变颜色,如果这有什么不同,我不会改变,但值得一提。

import bs4
from bs4 import BeautifulSoup
import pandas
import selenium
from selenium import webdriver
import pandas as pd


products=[] #List to store name of the product
prices=[] #List to store price of the product
ratings=[] #List to store rating of the product
driver = webdriver.Chrome(executable_path = r'C:\Users\directory\Desktop\chromedriver.exe')
driver.get("https://www.flipkart.com/laptops/~buyback-guarantee-on-laptops-/pr?sid=6bo%2Cb5g&uniq")
content = driver.page_source
soup = BeautifulSoup(content, 'html.parser')
for a in soup.findAll('a',href=True, attrs={'class':'_31qSD5'}):
    name=a.find('div', attrs={'class':'_3wU53n'})
    price=a.find('div', attrs={'class':'_1vC4OE _2rQ-NK'})
    rating=a.find('div', attrs={'class':'hGSR34 _2beYZw'})
products.append(name.text)
prices.append(price.text)
ratings.append(rating(("dd").text)

df = pd.Dataframe(data= {'Product Name': products,'Price': prices,'Rating':ratings})
df.to_csv('products.csv', index=False, encoding='utf-8')

The error:

df = pd.Dataframe(data= {'Product Name': products,'Price': prices,'Rating':ratings})
 ^
SyntaxError: invalid syntax'''


标签: pythonpandasdataframebeautifulsoup

解决方案


您的代码中有几个缺陷,从没有以正确的格式导入库,最重要的是在 for 循环中。根据您的代码,这些项目被添加到 for 循环之外的列表中,这可能不适用于所有项目。第二个问题是,以字典格式保存数据后,您不能简单地创建一个 csv 文件。试试下面的代码:

from bs4 import BeautifulSoup
import pandas as pd
from selenium import webdriver
import pandas as pd

products=[] #List to store name of the product
prices=[] #List to store price of the product
ratings=[] #List to store rating of the product
driver = webdriver.Chrome(executable_path = r'C:\Users\directory\Desktop\chromedriver.exe')
driver.get("https://www.flipkart.com/laptops/~buyback-guarantee-on-laptops-/pr?sid=6bo%2Cb5g&uniq")
content = driver.page_source
soup = BeautifulSoup(content, 'html.parser')
for a in soup.findAll('a',href=True, attrs={'class':'_31qSD5'}):
    name=a.find('div', attrs={'class':'_3wU53n'})
    price=a.find('div', attrs={'class':'_1vC4OE _2rQ-NK'})
    rating=a.find('div', attrs={'class':'hGSR34'})
    products.append(name.text)
    prices.append(price.text)
    ratings.append(rating.text)
    data = dict({'Product Name': products,
                 'Price': prices,
                 'Rating':ratings
                 })
    # create dataframe
    products_df = pd.DataFrame(
        dict([(k, pd.Series(v)) for k, v in data.items()])
        )
    products_df.to_csv('products.csv', sep=",")

结果

,产品名称,价格,等级 0,Apple MacBook Air Core i5 第 5 代 - (8 GB/128 GB SSD/Mac OS Sierra) MQD32HN/A A1466,"₹65,990",4.7 1,Lenovo Ideapad Core i5 第 7 代 - ( 8 GB/1 TB 硬盘/Windows 10 家庭版/2 GB 显卡)IP 320-15IKB 笔记本电脑,“₹51,990”,4.3 2,HP 15 Core i3 第 6 代 -(4 GB/1 TB 硬盘/Windows 10 家庭版)15- be014TU 笔记本电脑,“₹36,163”,4.1 3,Lenovo Core i5 第 7 代 -(8 GB/2 TB 硬盘/Windows 10 Home/4 GB 显卡)IP 520 笔记本电脑,“₹79,500”,4.4 4,Lenovo Core i5 第 7 代- (8 GB/1 TB HDD/DOS/2 GB 显卡)IP 320-15IKB 笔记本电脑,“₹56,990”,4.3


推荐阅读