首页 > 解决方案 > Docker scrapy spider AttributeError: 'TutorialPipeline' 对象没有属性 'cur'

问题描述

您好,当我尝试使用 docker 在 VPS 服务器上运行我的 scrapy 时,它会返回错误:

performing post-bootstrap initialization ... ok
web_1        | 2018-08-08 07:04:03 [scrapy.middleware] INFO: Enabled item pipelines:
web_1        | ['tutorial.pipelines.TutorialPipeline']
web_1        | 2018-08-08 07:04:03 [scrapy.core.engine] INFO: Spider opened
web_1        | 2018-08-08 07:04:03 [scrapy.core.engine] INFO: Closing spider (shutdown)
web_1        | 2018-08-08 07:04:03 [scrapy.core.engine] ERROR: Scraper close failure
web_1        | Traceback (most recent call last):
web_1        |   File "/usr/local/lib/python3.6/site-packages/scrapy/crawler.py", line 82, in crawl
web_1        |     yield self.engine.open_spider(self.spider, start_requests)
web_1        | psycopg2.OperationalError: server closed the connection unexpectedly
web_1        |  This probably means the server terminated abnormally
web_1        |  before or while processing the request.
web_1        |
web_1        |
web_1        | During handling of the above exception, another exception occurred:
web_1        |
web_1        | Traceback (most recent call last):
web_1        |   File "/usr/local/lib/python3.6/site-packages/twisted/internet/defer.py", line 654, in _runCallbacks
web_1        |     current.result = callback(current.result, *args, **kw)
web_1        |   File "/scrapy_estate/tutorial/pipelines.py", line 19, in close_spider
web_1        |     self.cur.close()
web_1        | AttributeError: 'TutorialPipeline' object has no attribute 'cur'
web_1        | 2018-08-08 07:04:03 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
web_1        | {'finish_reason': 'shutdown',
web_1        |  'finish_time': datetime.datetime(2018, 8, 8, 7, 4, 3, 806923),
web_1        |  'log_count/ERROR': 1,
web_1        |  'log_count/INFO': 6}
web_1        | 2018-08-08 07:04:03 [scrapy.core.engine] INFO: Spider closed (shutdown)
web_1        | Unhandled error in Deferred:
web_1        | 2018-08-08 07:04:03 [twisted] CRITICAL: Unhandled error in Deferred:
web_1        |
web_1        | 2018-08-08 07:04:03 [twisted] CRITICAL:
web_1        | Traceback (most recent call last):
web_1        |   File "/usr/local/lib/python3.6/site-packages/twisted/internet/defer.py", line 1418, in _inlineCallbacks
web_1        |     result = g.send(result)
web_1        |   File "/usr/local/lib/python3.6/site-packages/scrapy/crawler.py", line 82, in crawl
web_1        |     yield self.engine.open_spider(self.spider, start_requests)
web_1        | psycopg2.OperationalError: server closed the connection unexpectedly
web_1        |  This probably means the server terminated abnormally
web_1        |  before or while processing the request.
web_1        |

我的管道.py

import psycopg2
class TutorialPipeline(object):
    def open_spider(self, spider):
        hostname = 'localhost'
        username = 'postgres'
        password = '123' # your password
        database = 'real_estate'
        self.connection = psycopg2.connect(host=hostname, user=username, password=password, dbname=database)
        self.cur = self.connection.cursor()

    def close_spider(self, spider):
        self.cur.close()
        self.connection.close()

    def process_item(self, item, spider):
        self.cur.execute("insert into estate(estate_title,estate_address,estate_area,estate_description,estate_price,estate_type,estate_tag,estate_date,estate_seller_name,estate_seller_address,estate_seller_phone,estate_seller_mobile,estate_seller_email) values(%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s)",(item['estate_title'],item['estate_address'],item['estate_area'],item['estate_description'],item['estate_price'],item['estate_type'],item['estate_tag'],item['estate_date'],item['estate_seller_name'],item['estate_seller_address'],item['estate_seller_phone'],item['estate_seller_mobile'],item['estate_seller_email']))
        self.connection.commit()
        return item

编辑:我的 docker-compose.yml 所以我可以在 VPS 服务器上运行端口:

version: "3"
services:
  interface:
    links:
      - postgres:postgres
    image: adminer
    ports:
      - "8080:8080"
    networks:
      - webnet
  postgres:
    image: postgres
    container_name: postgres
    environment:
      POSTGRES_USER: 'postgres'
      POSTGRES_PASSWORD: '123'
    volumes:
    - ./init.sql:/docker-entrypoint-initdb.d/init.sql
    ports:
    - "5432:5432"
    expose:
    - "5432"
    networks:
      - webnet
  web:
    # replace username/repo:tag with your name and image details
    image: zerolin/scrapy_estate:latest
    build: ./tutorial
    ports:
      - "8081:8081"
    networks:
      - webnet
    environment:
      DB_HOST: postgres
    networks:
      - webnet
  splash:
    image: scrapinghub/splash
    ports:
     - "8050:8050"
    expose:
     - "8050"
networks:
  webnet:

我在本地电脑上运行了爬虫爬虫,它运行良好,没有遇到这个错误。

似乎无法将 self.cur 从 open_spider 函数获取到同一类中的其他函数:/

AttributeError: 'TutorialPipeline' object has no attribute 'cur'

但是当我在服务器上使用它时遇到了他的错误。我很困惑,任何帮助将不胜感激:)

我在 VPS 的 docker 端与 postgres 的连接似乎有问题,即使我检查了相同的 postgres 用户名和密码

标签: pythondockerscrapyvps

解决方案


你的日志说:

psycopg2.OperationalError:服务器意外关闭连接这可能意味着服务器在处理请求之前或期间异常终止。

您的管道无法打开连接:

def open_spider(self, spider):
    # ...
    self.connection = psycopg2.connect(host=hostname, user=username, password=password, dbname=database)
    self.cur = self.connection.cursor()

确保您可以建立与您的 postgres 服务器的连接。也许主机名或密码错误?


推荐阅读