首页 > 解决方案 > 在 Python 的多处理模块上发送未排队/处理的信号

问题描述

我有这段带有 Scrapy 代码的 Python,

import scrapy
import scrapy.crawler as crawler
from multiprocessing import Process, Queue
from twisted.internet import reactor

# your spider
class QuotesSpider(scrapy.Spider):
    name = "quotes"
    start_urls = ['http://quotes.toscrape.com/tag/humor/']

    def parse(self, response):
        for quote in response.css('div.quote'):
            print(quote.css('span.text::text').extract_first())


# the wrapper to make it run more times
def run_spider():
    def f(q):
        try:
            runner = crawler.CrawlerRunner()
            deferred = runner.crawl(QuotesSpider)
            deferred.addBoth(lambda _: reactor.stop())
            reactor.run()
            q.put(None)
        except Exception as e:
            q.put(e)

    q = Queue()
    p = Process(target=f, args=(q,))
    p.start()
    result = q.get()
    p.join()

    if result is not None:
        raise result


print('first run:')
run_spider()

print('\nsecond run:')
run_spider()

现在 run_spider 正在运行,即使QuotesSpider返回空白或错误​​。

当 QuotesSpider() 错误或空白时,如何使 run_spider() 不执行/排队?

谢谢

标签: pythonmultithreadingscrapytwistedtwisted.internet

解决方案


推荐阅读