redis - 芹菜工人正在运行,但突然节点不再回复
问题描述
我有一个芹菜工人,redis 后端运行了半年多,到目前为止我没有任何问题。
突然,我没有得到节点的任何回复。
可以成功启动celery,执行命令时没有报错:
celery multi start myqueue -A myapp.celery -Ofair
celery multi v4.3.0 (rhubarb)
> Starting nodes...
> myqueue@myhost: OK
但是,当我检查芹菜工人的状态时
celery -A myapp.celery status
我收到消息:
Error: No nodes replied within time constraint.
如果我查看进程,芹菜工人似乎正在运行:
/usr/bin/python3 -m celery worker -Ofair -A myapp.celery --concurrency=4
\_ /usr/bin/python3 -m celery worker -Ofair -A myapp.celery --concurrency=4
\_ /usr/bin/python3 -m celery worker -Ofair -A myapp.celery --concurrency=4
\_ /usr/bin/python3 -m celery worker -Ofair -A myapp.celery --concurrency=4
\_ /usr/bin/python3 -m celery worker -Ofair -A myapp.celery --concurrency=4
当我做一个
celery -A myapp.celery control shutdown
上述过程按预期删除。
从前台开始也没有给出任何提示:
$ celery -A myapp.celery myworker -l debug
Please specify a different user using the --uid option.
User information: uid=1000120000 euid=1000120000 gid=0 egid=0
uid=uid, euid=euid, gid=gid, egid=egid,
[2019-08-23 11:36:36,790: DEBUG/MainProcess] | Worker: Preparing bootsteps.
[2019-08-23 11:36:36,792: DEBUG/MainProcess] | Worker: Building graph...
[2019-08-23 11:36:36,793: DEBUG/MainProcess] | Worker: New boot order: {StateDB, Beat, Timer, Hub, Pool, Autoscaler, Consumer}
[2019-08-23 11:36:36,808: DEBUG/MainProcess] | Consumer: Preparing bootsteps.
[2019-08-23 11:36:36,808: DEBUG/MainProcess] | Consumer: Building graph...
[2019-08-23 11:36:36,862: DEBUG/MainProcess] | Consumer: New boot order: {Connection, Events, Mingle, Tasks, Control, Heart, Gossip, Agent, event loop}
-------------- celery@myapp-163-m4hs9 v4.3.0 (rhubarb)
---- **** -----
--- * *** * -- Linux-3.10.0-862.3.2.el7.x86_64-x86_64-with-Ubuntu-16.04-xenial 2019-08-23 11:36:36
-- * - **** ---
- ** ---------- [config]
- ** ---------- .> app: myapp:0x7f2094fcd978
- ** ---------- .> transport: redis://:**@${redis-host}:6379/0
- ** ---------- .> results: redis://:**@${redis-host}:6379/0
- *** --- * --- .> concurrency: 4 (prefork)
-- ******* ---- .> task events: OFF (enable -E to monitor tasks in this worker)
--- ***** -----
-------------- [queues]
.> myqueue exchange=myqueue(direct) key=myqueue
[tasks]
. sometask1
. sometask2
[2019-08-23 11:36:36,874: DEBUG/MainProcess] | Worker: Starting Hub
[2019-08-23 11:36:36,874: DEBUG/MainProcess] ^-- substep ok
[2019-08-23 11:36:36,874: DEBUG/MainProcess] | Worker: Starting Pool
[2019-08-23 11:36:37,278: DEBUG/MainProcess] ^-- substep ok
[2019-08-23 11:36:37,279: DEBUG/MainProcess] | Worker: Starting Consumer
[2019-08-23 11:36:37,280: DEBUG/MainProcess] | Consumer: Starting Connection
[2019-08-23 11:36:37,299: INFO/MainProcess] Connected to redis://:**@${redis-host}:6379/0
[2019-08-23 11:36:37,299: DEBUG/MainProcess] ^-- substep ok
[2019-08-23 11:36:37,299: DEBUG/MainProcess] | Consumer: Starting Events
[2019-08-23 11:36:37,311: DEBUG/MainProcess] ^-- substep ok
[2019-08-23 11:36:37,312: DEBUG/MainProcess] | Consumer: Starting Mingle
[2019-08-23 11:36:37,312: INFO/MainProcess] mingle: searching for neighbors
[2019-08-23 11:36:38,343: INFO/MainProcess] mingle: all alone
[2019-08-23 11:36:38,343: DEBUG/MainProcess] ^-- substep ok
[2019-08-23 11:36:38,343: DEBUG/MainProcess] | Consumer: Starting Tasks
[2019-08-23 11:36:38,350: DEBUG/MainProcess] ^-- substep ok
[2019-08-23 11:36:38,350: DEBUG/MainProcess] | Consumer: Starting Control
[2019-08-23 11:36:38,359: DEBUG/MainProcess] ^-- substep ok
[2019-08-23 11:36:38,359: DEBUG/MainProcess] | Consumer: Starting Heart
[2019-08-23 11:36:38,363: DEBUG/MainProcess] ^-- substep ok
[2019-08-23 11:36:38,363: DEBUG/MainProcess] | Consumer: Starting Gossip
[2019-08-23 11:36:38,371: DEBUG/MainProcess] ^-- substep ok
[2019-08-23 11:36:38,371: DEBUG/MainProcess] | Consumer: Starting event loop
[2019-08-23 11:36:38,372: DEBUG/MainProcess] | Worker: Hub.register Pool...
[2019-08-23 11:36:38,373: INFO/MainProcess] celery@myapp-163-m4hs9 ready.
[2019-08-23 11:36:38,373: DEBUG/MainProcess] basic.qos: prefetch_count->16
[2019-08-23 11:36:38,838: DEBUG/MainProcess] pidbox received method enable_events() [reply_to:None ticket:None]
[2019-08-23 11:36:38,839: INFO/MainProcess] Events of group {task} enabled by remote.
[2019-08-23 11:36:43,838: DEBUG/MainProcess] pidbox received method enable_events() [reply_to:None ticket:None]
Redis 正在运行:
redis-cli -h ${redis-host}
redis:6379> ping
PONG
日志文件不包含任何提示。
如前所述,当我检查芹菜工人的状态时
celery -A myapp.celery status
我收到消息:
Error: No nodes replied within time constraint.
相反,芹菜应该回应
> myqueue@myhost: OK
或者至少给出一些错误信息。
临时解决方案和进一步调查:
目前,立即采取的措施是将消息队列切换到 RabbitMQ,并且 worker 在线并再次响应。所以这个问题似乎特定于使用 Redis 作为消息队列。将 Celery/Redis-client 更新到最新版本(Celery 4.3.0、redis 3.3.8)没有帮助。Python 版本是 3.5(在 OpenShift 上)。
解决方案
Kombu 库(Celery 依赖项)的最新版本 (4.6.4) 中存在一个错误,导致 Redis 出现此问题,如本 Github 问题中所述。
该错误最近在 Kombu 存储库的拉取请求中得到修复,但尚未发布。
将 Kombu 降级到版本 4.6.3 将解决此问题。
推荐阅读
- terraform - 将新权限附加到 AWS 中的角色
- r - 从点到原点的ggplot线和余弦分数
- r - 在 data.table 中创建链索引/迭代组中的行
- android - 我可以在不启动新包名称的情况下对我的应用程序进行全面改造吗?
- c++ - 线性斐波那契实际代码的伪代码
- apache-spark - 如何在 Kubernetes 上运行 Spark Standalone master,它将使用 Kubernetes Cluser Manager 来启动 worker
- ms-word - 如何使用 OOXML 以编程方式检测复杂的 scipt?
- express - 在 PATCH 请求期间要散列的密码
- windows - 用于连接 SFTP 服务器的 Windows 批处理脚本
- r - 按线 ID 将空间点属性附加到空间线数据框