首页 > 解决方案 > 抓取并存储在数据库中时没有输出

问题描述

我想使用 SQLite3 将我从亚马逊抓取的数据存储在数据库中。首先,我将其抓取并以 .csv 格式存储。它工作得很好。但是当我试图将它存储在数据库中时,它说:

错误: Cost = r.html.find('#priceblock_ourprice', first=True).text.strip()[1:] AttributeError: 'NoneType' 对象没有属性 'text'

我最初用来获取 .csv 格式的代码是:

import csv
from requests_html import HTMLSession

csv_file = open('Laptop2.csv', 'w', encoding = 'utf-8')
csv_writer = csv.writer(csv_file)
csv_writer.writerow(['Laptop','Cost', 'Savings'])

urls = ["https://www.amazon.in/gp/product/B091HGK1B6/ref=ox_sc_act_title_1?smid=A372Y0DOIAPTGJ&psc=1","https://www.amazon.in/gp/product/B08D3T9CK3/ref=ox_sc_act_title_2?smid=A5QX138YR4YQ&psc=1","https://www.amazon.in/gp/product/B096W63DZV/ref=ox_sc_act_title_3?smid=A339C6POJNB9GM&psc=1","https://www.amazon.in/gp/product/B0928TPR8H/ref=ox_sc_act_title_4?smid=A2YBFAXWY0FFA4&psc=1"]

for url in URLs:
    header = {
        "User-Agent": "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.159 Safari/537.36"
    }

    session = HTMLSession()
    r = session.get(url, headers = header)
    r.html.render(timeout=120)

    title = r.html.find('#productTitle', first=True).text.strip()[:25]
    Cost = r.html.find('span#priceblock_ourprice', first=True).text.strip()[1:]
    Savings = r.html.find('.priceBlockSavingsString', first=True).text.strip()[1:]
    
    csv_writer.writerow([title, Cost, Savings])

csv_file.close()

然后我用这段代码存储在 SQLite3 DB 中:

from requests_html import HTMLSession
import sqlite3
import datetime

connection = sqlite3.connect('laptop.db')
c = connection.cursor()

c.execute('''CREATE TABLE Tracker(Date DATE, Name TEXT, price REAL, Savings REAL)''')

urls = ["https://www.amazon.in/gp/product/B091HGK1B6/ref=ox_sc_act_title_1?smid=A372Y0DOIAPTGJ&psc=1","https://www.amazon.in/gp/product/B08D3T9CK3/ref=ox_sc_act_title_2?smid=A5QX138YR4YQ&psc=1","https://www.amazon.in/gp/product/B096W63DZV/ref=ox_sc_act_title_3?smid=A339C6POJNB9GM&psc=1","https://www.amazon.in/gp/product/B0928TPR8H/ref=ox_sc_act_title_4?smid=A2YBFAXWY0FFA4&psc=1"]

for url in urls:   

    header = {
        "User-Agent": "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.159 Safari/537.36"
    }

    session = HTMLSession()
    r = session.get(url, headers = header)
    r.html.render(timeout=120)

    current_date = datetime.datetime.now()
    title = r.html.find('#productTitle', first=True).text.strip()[:25]
    Cost = r.html.find('#priceblock_ourprice', first=True).text.strip()[1:]
    Savings = r.html.find('.priceBlockSavingsString', first=True).text.strip()[1:]

    c.execute('''INSERT INTO Tracker VALUES(?,?,?,?)''', (current_date, title, Cost, Savings))

connection.commit()
c.execute(''' SELECT price FROM Tracker''')
results = c.fetchall()
print(results)
connection.close()

我觉得我在 sqlite3 或 requests_html 中犯了一些错误。请帮我解决这个问题。

标签: pythonsqliteweb-scrapingpython-requests-html

解决方案


当没有 json 格式的数据时,我们不应该使用 render 函数。所以如果我删除那条线,这个问题就解决了。


推荐阅读