首页 > 解决方案 > scrapy cralwer 无法解析 mysql 数据库中的数据

问题描述

我用scrapy构建了一个网络爬虫,并将数据存储到mysql数据库(我从一个url抓取源代码),现在我想做离线编辑。因此,我创建了 sql 查询以使用 python 导出数据,并尝试从中进行爬网。

你能建议怎么做吗?实际上,我不能用scrapy来做到这一点。如果有人有任何建议或类似项目并且可以帮助我,我没有用scrapy做到这一点。

我试过用scrapy查询数据库并将数据存储到

 from scrapy.http import HtmlResponse
import mysql
from mysql.connector import Error
import scrapy
import re
import requests
from bs4 import BeautifulSoup

# this here connect to database and q all url that have been crawled and store it into records.


class database:
    def query():
        try:
            connection = mysql.connector.connect(host='',
                                                 database='',
                                                 user='',
                                                 password='')
            cursor = connection.cursor(prepared=True)
            if connection.is_connected():
                db_Info = connection.get_server_info()
                done = "Connected to MySQL database... MySQL Server version on "
            sql_select_Query = """ SELECT  `job_url`, `job_description` FROM `store_all` WHERE job_url LIKE '%kariera.gr% """
            cursor = connection.cursor()
            cursor.execute(sql_select_Query)
            records = cursor.fetchall()
        except mysql.connector.Error as error:
            not_done = "Failed to connect {}".format(error)

        return records

    def insert(job_url, metakey, metavalue):
            try:
                connection = mysql.connector.connect(host='',
                                                 database='',
                                                 user='',
                                                 password='')
                cursor = connection.cursor(prepared=True)
                sql_insert_query = """ INSERT INTO `store`( `url`, `metakey`, `metavalue`, ) VALUES (%s,%s,%s)"""
                insert_tuple = (job_url, metakey, metavalue)
                result = cursor.execute(sql_insert_query, insert_tuple)
                connection.commit()
                done = "Record inserted successfully into python_users table"
            except mysql.connector.Error as error:
                connection.rollback()
                not_done = "Failed to insert into MySQL table {}".format(error)
            return done

class Crawler(scrapy.Spider,database):
        records =database.query()
        records=records[0]
        response = HtmlResponse(url="Any String", body=records,encoding='utf-8')
        job=response.xpath('//ul[@class="tab_content"]/text()').extract()
        url= records
        metakey= "test"
        metavalue= "test"
        print(database.query())
        print(database.insert(url,metakey,metavalue))

标签: python-3.xflaskscrapyweb-crawlermysql-python

解决方案


这个问题实际上已经解决了

 b = ''.join(body1)

 response = TextResponse(url="Any String", body=b,encoding='utf-8')

推荐阅读