python - 如何从 booktoscrape 网站获取评级并将其输入数据库
问题描述
目前,我正在尝试从一个名为bookstoscrape的网站获取评级并将其作为一种实践输入数据库,但出现了一个错误:
InterfaceError:错误绑定参数 1 - 可能是不受支持的类型。
这是我的代码:
from bs4 import BeautifulSoup
import requests
import sqlite3
import re
conn = sqlite3.connect('scraped.db')
curs = conn.cursor()
curs.execute(''' CREATE TABLE CATEGORY(Id INTEGER PRIMARY KEY,NAME TEXT)''')
curs.execute(''' CREATE TABLE BOOKS(Category_Id INTEGER, NAME TEXT,PRICE INTEGER,RATING TEXT)''')
html_content = requests.get('http://books.toscrape.com')
soup = BeautifulSoup(html_content.content)
url = "http://books.toscrape.com/"
def getURLs(url):
result = requests.get(url)
soup = BeautifulSoup(result.text, 'html.parser')
return(soup)
def getBooks(url):
soup = getURLs(url)
# remove the index.html part of the base url before returning the results
return(["/".join(url.split("/")[:-1]) + "/" + x.find("div").find("a").get('href') for x in soup.findAll("article", attrs = {"class":"product_pod"})])
pages_urls = []
new_page = "http://books.toscrape.com/catalogue/page-1.html"
while requests.get(new_page).status_code == 200:
pages_urls.append(new_page)
new_page = pages_urls[-1].split("-")[0] + "-" + str(int(pages_urls[-1].split("-")[1].split(".")[0]) + 1) + ".html"
booksURLs = []
for page in pages_urls:
booksURLs.extend(getBooks(page))
names = []
prices = []
rate = []
for x in range(len(booksURLs)):
soup = getURLs(url)
all_articles = soup.find_all("article", attrs = {"class":"product_pod"})
for article in all_articles:
names.append(soup.find("article", class_ = ("product_pod")).find("h3").get_text())
prices.append(soup.find("p", class_ = "price_color").text[2:]) # get rid of the pound sign
rate.append(soup.find("article", class_ = ("product_pod")).find('p').get('class')[1])
curs.execute("INSERT INTO BOOKS VALUES(?,?,?,?)",(x,names,prices,rate))
conn.commit()
conn.close()
解决方案
名称、价格和费率变量是列表,因此:
curs.execute("INSERT INTO BOOKS VALUES(?,?,?,?)",(x,names,prices,rate))
必须变成这样:
curs.execute("INSERT INTO BOOKS VALUES(?,?,?,?)",(x,names[0],prices[0],rate[0]))
推荐阅读
- react-native - AWS Amplify Auth Login 从 React Native 上的 React Native Webview
- gson - 如何使 Gson 将值读取为字符串?
- azure-devops - 使 aws-sdk node_module 可用于我的 Azure DevOps 扩展
- android - 我们可以使用firebase控制台手动上传图片吗?
- python - 将长熊猫数据框转换为 numpy 2d 矩形数组
- java - 如何从super的关联类调用子类方法
- javascript - 当我点击我的图表时,我怎样才能做点什么?
- d3.js - D3.js .each() 具有传递数据的功能
- python - 从 QDataWidgetMapper 获得 2 个组合框
- visual-studio-code - PATH 变量是如何为 vscode 进程本身(不是集成终端)定义的?