首页 > 解决方案 > Splash 因“客户端超时:IPv4Address”而冻结

问题描述

我正在scrapy-splash从一个网站上抓取数据。

定期(随机)飞溅冻结下一个日志:

[36msplash-service_1        |[0m 2020-07-16 08:49:35.119333 [-] "172.31.0.4" - - [16/Jul/2020:08:49:34 +0000] "POST /execute HTTP/1.1" 200 266018 "-" "Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/27.0.1453.93 Safari/537.36"
[36msplash-service_1        |[0m 2020-07-16 08:50:10.012973 [-] Timing out client: IPv4Address(type='TCP', host='172.31.0.4', port=51970)
[36msplash-service_1        |[0m 2020-07-16 08:50:10.858080 [-] Timing out client: IPv4Address(type='TCP', host='172.31.0.4', port=51978)
[36msplash-service_1        |[0m 2020-07-16 08:50:16.873014 [-] Timing out client: IPv4Address(type='TCP', host='172.31.0.4', port=51974)
[36msplash-service_1        |[0m 2020-07-16 08:50:17.547947 [-] Timing out client: IPv4Address(type='TCP', host='172.31.0.4', port=51966)
[36msplash-service_1        |[0m 2020-07-16 08:50:18.037436 [-] Timing out client: IPv4Address(type='TCP', host='172.31.0.4', port=51976)
[36msplash-service_1        |[0m 2020-07-16 08:50:29.064655 [-] Timing out client: IPv4Address(type='TCP', host='172.31.0.4', port=51932)
[36msplash-service_1        |[0m 2020-07-16 08:50:35.119997 [-] Timing out client: IPv4Address(type='TCP', host='172.31.0.4', port=51968)

我怎样才能得到它的原因?为什么会卡住?

PS我运行它args={"lua_source": self.lua_script_navigate, "timeout":60000}

标签: scrapy-splash

解决方案


请参阅参数的Splash 的 HTTP API 文档timeout

超时:浮动:可选

渲染的超时时间(以秒为单位)(默认为 30)。

默认情况下,允许的最大超时值为 90 秒。要覆盖它,请使用 --max-timeout 命令行选项启动 Splash。例如,此处 Splash 配置为允许最长 5 分钟的超时:

$ docker run -it -p 8050:8050 scrapinghub/splash --max-timeout 300

如果你还没有开始使用 splash --max-timeout,你的 lua_script 会在 30 秒后中止,即使在args.


推荐阅读