首页 > 解决方案 > 处理scrapy异常CloseSpider

问题描述

import scrapy
from urllib.parse import urlparse
from tkinter import filedialog
import tkinter as tk
import csv
from scrapy.exceptions import CloseSpider

class GoogleSpider(scrapy.Spider):
    name = 'google'
    allowed_domains = ['google.com']

    start_urls = ['http://www.google.com/search?q=summer&hl=en&num=40']

    def __init__(self,stats):
        self.stats = stats
        SignalManager(dispatcher.Any).connect(receiver=self._close, signal=signals.spider_closed)
        SignalManager(dispatcher.Any).connect(receiver=self._open,signal=signals.spider_opened)

    def _open(self):
        #os.system('cls' if os.name == 'nt' else 'clear')
        root = tk.Tk()
        root.withdraw()
        self.input_file = filedialog.askopenfilename(title='Please Select Keywords File', filetypes=[('CSV files', ('.csv'))])
        if not self.input_file:
            raise CloseSpider(reason='no_file')

    def _close(self):
        print("done")

如果没有给出文件,我会收到此错误,我想要的是简单地关闭蜘蛛而不显示错误。如何处理这个异常Exceptions,我从文档中读到这是你关闭蜘蛛的方式它没有提到如何处理它

2019-08-10 14:55:56 [scrapy.utils.signal] ERROR: Error caught on signal handler: <bound method GoogleSpider._open of <GoogleSpider 'google' at 0x7fd56b6eda58>>
Traceback (most recent call last):
  File "/home/timmy/.local/lib/python3.6/site-packages/twisted/internet/defer.py", line 151, in maybeDeferred
    result = f(*args, **kw)
  File "/home/timmy/.local/lib/python3.6/site-packages/pydispatch/robustapply.py", line 55, in robustApply
    return receiver(*arguments, **named)
  File "/home/timmy/spiders/google.py", line 61, in _open
    raise CloseSpider(reason='no_file')
scrapy.exceptions.CloseSpider

标签: pythonscrapy

解决方案


我认为目前不支持。

或者,您可以self.start_urls = []改为设置,得到类似的结果。


推荐阅读