首页 > 解决方案 > 芹菜工人正在运行,但突然节点不再回复

问题描述

我有一个芹菜工人,redis 后端运行了半年多,到目前为止我没有任何问题。

突然,我没有得到节点的任何回复。

可以成功启动celery,执行命令时没有报错:

celery multi start myqueue -A myapp.celery -Ofair
celery multi v4.3.0 (rhubarb)
> Starting nodes...
> myqueue@myhost: OK

但是,当我检查芹菜工人的状态时

celery -A myapp.celery status

我收到消息:

Error: No nodes replied within time constraint.

如果我查看进程,芹菜工人似乎正在运行:

/usr/bin/python3 -m celery worker -Ofair -A myapp.celery --concurrency=4
\_ /usr/bin/python3 -m celery worker -Ofair -A myapp.celery --concurrency=4
\_ /usr/bin/python3 -m celery worker -Ofair -A myapp.celery --concurrency=4
\_ /usr/bin/python3 -m celery worker -Ofair -A myapp.celery --concurrency=4
\_ /usr/bin/python3 -m celery worker -Ofair -A myapp.celery --concurrency=4

当我做一个

celery -A myapp.celery control shutdown

上述过程按预期删除。

从前台开始也没有给出任何提示:

$ celery -A myapp.celery myworker -l debug
Please specify a different user using the --uid option.

User information: uid=1000120000 euid=1000120000 gid=0 egid=0


uid=uid, euid=euid, gid=gid, egid=egid,
[2019-08-23 11:36:36,790: DEBUG/MainProcess] | Worker: Preparing bootsteps.
[2019-08-23 11:36:36,792: DEBUG/MainProcess] | Worker: Building graph...
[2019-08-23 11:36:36,793: DEBUG/MainProcess] | Worker: New boot order: {StateDB, Beat, Timer, Hub, Pool, Autoscaler, Consumer}
[2019-08-23 11:36:36,808: DEBUG/MainProcess] | Consumer: Preparing bootsteps.
[2019-08-23 11:36:36,808: DEBUG/MainProcess] | Consumer: Building graph...
[2019-08-23 11:36:36,862: DEBUG/MainProcess] | Consumer: New boot order: {Connection, Events, Mingle, Tasks, Control, Heart, Gossip, Agent, event loop}

 -------------- celery@myapp-163-m4hs9 v4.3.0 (rhubarb)
---- **** ----- 
--- * ***  * -- Linux-3.10.0-862.3.2.el7.x86_64-x86_64-with-Ubuntu-16.04-xenial 2019-08-23 11:36:36
-- * - **** --- 
- ** ---------- [config]
- ** ---------- .> app:         myapp:0x7f2094fcd978
- ** ---------- .> transport:   redis://:**@${redis-host}:6379/0
- ** ---------- .> results:     redis://:**@${redis-host}:6379/0
- *** --- * --- .> concurrency: 4 (prefork)
-- ******* ---- .> task events: OFF (enable -E to monitor tasks in this worker)
--- ***** ----- 
 -------------- [queues]
                .> myqueue      exchange=myqueue(direct) key=myqueue


[tasks]
  . sometask1
  . sometask2
[2019-08-23 11:36:36,874: DEBUG/MainProcess] | Worker: Starting Hub
[2019-08-23 11:36:36,874: DEBUG/MainProcess] ^-- substep ok
[2019-08-23 11:36:36,874: DEBUG/MainProcess] | Worker: Starting Pool
[2019-08-23 11:36:37,278: DEBUG/MainProcess] ^-- substep ok
[2019-08-23 11:36:37,279: DEBUG/MainProcess] | Worker: Starting Consumer
[2019-08-23 11:36:37,280: DEBUG/MainProcess] | Consumer: Starting Connection
[2019-08-23 11:36:37,299: INFO/MainProcess] Connected to redis://:**@${redis-host}:6379/0
[2019-08-23 11:36:37,299: DEBUG/MainProcess] ^-- substep ok
[2019-08-23 11:36:37,299: DEBUG/MainProcess] | Consumer: Starting Events
[2019-08-23 11:36:37,311: DEBUG/MainProcess] ^-- substep ok
[2019-08-23 11:36:37,312: DEBUG/MainProcess] | Consumer: Starting Mingle
[2019-08-23 11:36:37,312: INFO/MainProcess] mingle: searching for neighbors
[2019-08-23 11:36:38,343: INFO/MainProcess] mingle: all alone
[2019-08-23 11:36:38,343: DEBUG/MainProcess] ^-- substep ok
[2019-08-23 11:36:38,343: DEBUG/MainProcess] | Consumer: Starting Tasks
[2019-08-23 11:36:38,350: DEBUG/MainProcess] ^-- substep ok
[2019-08-23 11:36:38,350: DEBUG/MainProcess] | Consumer: Starting Control
[2019-08-23 11:36:38,359: DEBUG/MainProcess] ^-- substep ok
[2019-08-23 11:36:38,359: DEBUG/MainProcess] | Consumer: Starting Heart
[2019-08-23 11:36:38,363: DEBUG/MainProcess] ^-- substep ok
[2019-08-23 11:36:38,363: DEBUG/MainProcess] | Consumer: Starting Gossip
[2019-08-23 11:36:38,371: DEBUG/MainProcess] ^-- substep ok
[2019-08-23 11:36:38,371: DEBUG/MainProcess] | Consumer: Starting event loop
[2019-08-23 11:36:38,372: DEBUG/MainProcess] | Worker: Hub.register Pool...
[2019-08-23 11:36:38,373: INFO/MainProcess] celery@myapp-163-m4hs9 ready.
[2019-08-23 11:36:38,373: DEBUG/MainProcess] basic.qos: prefetch_count->16
[2019-08-23 11:36:38,838: DEBUG/MainProcess] pidbox received method enable_events() [reply_to:None ticket:None]
[2019-08-23 11:36:38,839: INFO/MainProcess] Events of group {task} enabled by remote.
[2019-08-23 11:36:43,838: DEBUG/MainProcess] pidbox received method enable_events() [reply_to:None ticket:None]

Redis 正在运行:

redis-cli -h ${redis-host}
redis:6379> ping
PONG

日志文件不包含任何提示。

如前所述,当我检查芹菜工人的状态时

celery -A myapp.celery status

我收到消息:

Error: No nodes replied within time constraint.

相反,芹菜应该回应

> myqueue@myhost: OK

或者至少给出一些错误信息。

临时解决方案和进一步调查:

目前,立即采取的措施是将消息队列切换到 RabbitMQ,并且 worker 在线并再次响应。所以这个问题似乎特定于使用 Redis 作为消息队列。将 Celery/Redis-client 更新到最新版本(Celery 4.3.0、redis 3.3.8)没有帮助。Python 版本是 3.5(在 OpenShift 上)。

标签: rediscelery

解决方案


Kombu 库(Celery 依赖项)的最新版本 (4.6.4) 中存在一个错误,导致 Redis 出现此问题,如本 Github 问题中所述。

该错误最近在 Kombu 存储库的拉取请求中得到修复,但尚未发布。

将 Kombu 降级到版本 4.6.3 将解决此问题。


推荐阅读