python-3.x - scrapy cralwer 无法解析 mysql 数据库中的数据
问题描述
我用scrapy构建了一个网络爬虫,并将数据存储到mysql数据库(我从一个url抓取源代码),现在我想做离线编辑。因此,我创建了 sql 查询以使用 python 导出数据,并尝试从中进行爬网。
你能建议怎么做吗?实际上,我不能用scrapy来做到这一点。如果有人有任何建议或类似项目并且可以帮助我,我没有用scrapy做到这一点。
我试过用scrapy查询数据库并将数据存储到
from scrapy.http import HtmlResponse
import mysql
from mysql.connector import Error
import scrapy
import re
import requests
from bs4 import BeautifulSoup
# this here connect to database and q all url that have been crawled and store it into records.
class database:
def query():
try:
connection = mysql.connector.connect(host='',
database='',
user='',
password='')
cursor = connection.cursor(prepared=True)
if connection.is_connected():
db_Info = connection.get_server_info()
done = "Connected to MySQL database... MySQL Server version on "
sql_select_Query = """ SELECT `job_url`, `job_description` FROM `store_all` WHERE job_url LIKE '%kariera.gr% """
cursor = connection.cursor()
cursor.execute(sql_select_Query)
records = cursor.fetchall()
except mysql.connector.Error as error:
not_done = "Failed to connect {}".format(error)
return records
def insert(job_url, metakey, metavalue):
try:
connection = mysql.connector.connect(host='',
database='',
user='',
password='')
cursor = connection.cursor(prepared=True)
sql_insert_query = """ INSERT INTO `store`( `url`, `metakey`, `metavalue`, ) VALUES (%s,%s,%s)"""
insert_tuple = (job_url, metakey, metavalue)
result = cursor.execute(sql_insert_query, insert_tuple)
connection.commit()
done = "Record inserted successfully into python_users table"
except mysql.connector.Error as error:
connection.rollback()
not_done = "Failed to insert into MySQL table {}".format(error)
return done
class Crawler(scrapy.Spider,database):
records =database.query()
records=records[0]
response = HtmlResponse(url="Any String", body=records,encoding='utf-8')
job=response.xpath('//ul[@class="tab_content"]/text()').extract()
url= records
metakey= "test"
metavalue= "test"
print(database.query())
print(database.insert(url,metakey,metavalue))
解决方案
这个问题实际上已经解决了
b = ''.join(body1)
response = TextResponse(url="Any String", body=b,encoding='utf-8')
推荐阅读
- c++ - 我收到一个“错误:编译时没有匹配的函数调用‘leftRotate’
- r - 我可以根据一组常量对 R 中的行进行回归吗?
- python - 通过按下按钮(tkinter)将字符串附加到 StringVar 实例
- android - 在没有 Play Services 的设备上使用通用 apk 中的按需动态功能模块
- c# - 尝试从保存文件加载游戏数据时出现 FileNotFoundException
- javascript - Jquery选择的插件不为rails 6呈现
- azure - 无法将 Web 包部署到 Azure DevOps 上的应用服务
- iis - 尝试初始化默认应用程序域时,C++ 模块加载失败
- jquery - JQuery 选择器返回相同的类
- java - 无法找到一种方法让我的代码扫描输入了多少整数来设置数组长度并让它在数组中设置这些整数