python - 处理scrapy异常CloseSpider
问题描述
import scrapy
from urllib.parse import urlparse
from tkinter import filedialog
import tkinter as tk
import csv
from scrapy.exceptions import CloseSpider
class GoogleSpider(scrapy.Spider):
name = 'google'
allowed_domains = ['google.com']
start_urls = ['http://www.google.com/search?q=summer&hl=en&num=40']
def __init__(self,stats):
self.stats = stats
SignalManager(dispatcher.Any).connect(receiver=self._close, signal=signals.spider_closed)
SignalManager(dispatcher.Any).connect(receiver=self._open,signal=signals.spider_opened)
def _open(self):
#os.system('cls' if os.name == 'nt' else 'clear')
root = tk.Tk()
root.withdraw()
self.input_file = filedialog.askopenfilename(title='Please Select Keywords File', filetypes=[('CSV files', ('.csv'))])
if not self.input_file:
raise CloseSpider(reason='no_file')
def _close(self):
print("done")
如果没有给出文件,我会收到此错误,我想要的是简单地关闭蜘蛛而不显示错误。如何处理这个异常Exceptions,我从文档中读到这是你关闭蜘蛛的方式它没有提到如何处理它
2019-08-10 14:55:56 [scrapy.utils.signal] ERROR: Error caught on signal handler: <bound method GoogleSpider._open of <GoogleSpider 'google' at 0x7fd56b6eda58>>
Traceback (most recent call last):
File "/home/timmy/.local/lib/python3.6/site-packages/twisted/internet/defer.py", line 151, in maybeDeferred
result = f(*args, **kw)
File "/home/timmy/.local/lib/python3.6/site-packages/pydispatch/robustapply.py", line 55, in robustApply
return receiver(*arguments, **named)
File "/home/timmy/spiders/google.py", line 61, in _open
raise CloseSpider(reason='no_file')
scrapy.exceptions.CloseSpider
解决方案
我认为目前不支持。
或者,您可以self.start_urls = []
改为设置,得到类似的结果。
推荐阅读
- javascript - 如何获取点击的html链接的ID?
- python - 通过搜索相邻列来选择一列的内容
- spring-boot - Spring Boot Azure Functions 项目不适用于 Kotlin
- r - 在ggplot2中可视化的累积值
- microsoft-graph-api - 如何从 javascript sdk 或 graph sdk 获取 Microsoft Teams 中的会议详细信息?
- python - 气流一次重新运行一个 dag 实例
- javascript - 使用 React.Context 注入 UI 组件是个好主意吗?
- c - 可以将 RAW 套接字绑定到 ip:port 而不是接口吗?
- amazon-web-services - terraform 12 计数 date_template 不起作用
- macos - 工作流程中 Opencv 出现 CMAKE 错误(github 操作)