python - Scrapy - “scrapy crawl” 在内部捕获异常并将它们隐藏在 Jenkins 的 “catch” 子句中
问题描述
我scrapy
每天都在运行 Jenkins,我希望通过电子邮件将异常发送给我。
这是一个示例蜘蛛:
class ExceptionTestSpider(Spider):
name = 'exception_test'
start_urls = ['http://google.com']
def parse(self, response):
raise Exception
这是.Jenkinsfile
:
#!/usr/bin/env groovy
try {
node ('jenkins-small-py3.6'){
...
stage('Execute Spider') {
cd ...
/usr/local/bin/scrapy crawl exception_test
}
}
} catch (exc) {
echo "Caught: ${exc}"
mail subject: "...",
body: "The spider is failing",
to: "...",
from: "..."
/* Rethrow to fail the Pipeline properly */
throw exc
}
这是日志:
...
INFO:scrapy.core.engine:Spider opened
2019-08-22 10:49:49 [scrapy.core.engine] INFO: Spider opened
INFO:scrapy.extensions.logstats:Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2019-08-22 10:49:49 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
DEBUG:scrapy.extensions.telnet:Telnet console listening on 127.0.0.1:6023
DEBUG:scrapy.downloadermiddlewares.redirect:Redirecting (301) to <GET http://www.google.com/> from <GET http://google.com>
DEBUG:scrapy.core.engine:Crawled (200) <GET http://www.google.com/> (referer: None)
ERROR:scrapy.core.scraper:Spider error processing <GET http://www.google.com/> (referer: None)
Traceback (most recent call last):
File "/usr/local/lib/python3.6/dist-packages/twisted/internet/defer.py", line 654, in _runCallbacks
current.result = callback(current.result, *args, **kw)
File "...", line ..., in parse
raise Exception
Exception
2019-08-22 10:49:50 [scrapy.core.scraper] ERROR: Spider error processing <GET http://www.google.com/> (referer: None)
Traceback (most recent call last):
File "/usr/local/lib/python3.6/dist-packages/twisted/internet/defer.py", line 654, in _runCallbacks
current.result = callback(current.result, *args, **kw)
File "...", line ..., in parse
raise Exception
Exception
INFO:scrapy.core.engine:Closing spider (finished)
2019-08-22 10:49:50 [scrapy.core.engine] INFO: Closing spider (finished)
INFO:scrapy.statscollectors:Dumping Scrapy stats:
{
...
}
INFO:scrapy.core.engine:Spider closed (finished)
2019-08-22 10:49:50 [scrapy.core.engine] INFO: Spider closed (finished)
[Pipeline] }
[Pipeline] // stage
[Pipeline] }
[Pipeline] // node
[Pipeline] End of Pipeline
Finished: SUCCESS
并且没有邮件发送。我相信 Scrapy 在内部捕获异常,将其保存以稍后记录,然后退出而没有错误。
我怎样才能让詹金斯得到例外?
解决方案
问题是当抓取失败时, scrapy不使用非零退出代码(src:https ://github.com/scrapy/scrapy/issues/1231 )。
正如该问题的评论者所说,我建议您添加一个自定义命令(http://doc.scrapy.org/en/master/topics/commands.html#custom-project-commands)。
推荐阅读
- macos - shellinabox @2.20:错误:“ptsname_r”的静态声明遵循非静态声明
- c++ - Qt C++11字符串通过子字符串比较列出交集
- php - Symfony Twig 变量在 Inspect 上为空错误,但仍在工作
- javascript - 我想直接在 d3js 中放大到一个圆圈,而不是单击或双击它,而是以编程方式
- mongodb - MongoDB 聚合查询索引未按预期工作
- sql-server - 无法通过 express 错误连接到 sql 数据库“ConnectionError:无法在 .-getaddrinfo EAI_AGAIN .initial 连接上查找实例”
- hashicorp-vault - 如何使用 Hashicorp Vault 执行 25519 ECDH
- javascript - 失败:TypeError:__WEBPACK_IMPORTED_MODULE_2_react___default.a.createContext 即使在更新后也不是函数
- python - 如何使用python在不同的变量中迭代和赋值
- javascript - 我需要更改什么才能在 JS 中调用我的函数?