python - 在 Python 的多处理模块上发送未排队/处理的信号
问题描述
我有这段带有 Scrapy 代码的 Python,
import scrapy
import scrapy.crawler as crawler
from multiprocessing import Process, Queue
from twisted.internet import reactor
# your spider
class QuotesSpider(scrapy.Spider):
name = "quotes"
start_urls = ['http://quotes.toscrape.com/tag/humor/']
def parse(self, response):
for quote in response.css('div.quote'):
print(quote.css('span.text::text').extract_first())
# the wrapper to make it run more times
def run_spider():
def f(q):
try:
runner = crawler.CrawlerRunner()
deferred = runner.crawl(QuotesSpider)
deferred.addBoth(lambda _: reactor.stop())
reactor.run()
q.put(None)
except Exception as e:
q.put(e)
q = Queue()
p = Process(target=f, args=(q,))
p.start()
result = q.get()
p.join()
if result is not None:
raise result
print('first run:')
run_spider()
print('\nsecond run:')
run_spider()
现在 run_spider 正在运行,即使QuotesSpider
返回空白或错误。
当 QuotesSpider() 错误或空白时,如何使 run_spider() 不执行/排队?
谢谢
解决方案
推荐阅读
- android - Unity: How to make input field respond to only double taps in Android?
- ios - my react native project does not contained any app.xcworkspace file in iOS
- tensorflow - Is ArcFace strictly a loss function or an activation function?
- image - 如何将 .jpg 读入 tensorflow 数据集并使用会话显示图像
- ssl - Firefox 中格式错误的服务器 Hello(TLS1.3)
- javascript - 属性名称未知时如何在javascript中解析JSON数据
- azure - Azure DevOps - ARM 部署 - 密钥保管库和托管标识
- python - Python 3:运算符 -IndentationError
- sql - 视图和触发器的上次运行日期
- apache-kafka-streams - Kafka Streams 将新源添加到正在运行的应用程序