docker - RabbitMQ 节点因代理 Nginx 服务器而死
问题描述
我们有一个 RabbitMQ 集群,该集群是由不再存在的某个人设置的,我对此知之甚少,无法解决问题。我知道它失败的唯一方法是因为使用队列的新贵服务的程序错误警报。
这就是我目前所知道的:
在代理服务器上(Nginx 1.10.3,Ubuntu 16.04):
在/etc/nginx/nginx.conf
中,我有:
stream {
# debug|info|notice|warn|error|crit|alert|emerg
error_log /var/log/nginx/stream_error.log info;
server {
listen 192.168.70.11:5672 so_keepalive=on;
proxy_pass rabbitmq_backend;
}
upstream rabbitmq_backend {
server services-01:5672;
#server services-00:5672; <--- Commented this out
}
}
在其中一个日志中运行 upstart 服务的服务器上,例如/var/log/upstart/<service-name>.log
:
2021-08-21 05:33:10NOT_FOUND - home node 'rabbit@rabbit-services-00' of durable queue 'cert_collect_info' in vhost 'certificate' is down or inaccessible
2021-08-21 05:33:11NOT_FOUND - home node 'rabbit@rabbit-services-00' of durable queue 'cert_collect_info' in vhost 'certificate' is down or inaccessible
2021-08-21 05:33:11NOT_FOUND - home node 'rabbit@rabbit-services-00' of durable queue 'cert_collect_info' in vhost 'certificate' is down or inaccessible
2021-08-21 05:33:11NOT_FOUND - home node 'rabbit@rabbit-services-00' of durable queue 'cert_collect_info' in vhost 'certificate' is down or inaccessible
2021-08-21 05:34:12NOT_FOUND - home node 'rabbit@rabbit-services-00' of durable queue 'cert_collect_info' in vhost 'certificate' is down or inaccessible
2021-08-21 05:34:12NOT_FOUND - home node 'rabbit@rabbit-services-00' of durable queue 'cert_collect_info' in vhost 'certificate' is down or inaccessible
2021-08-21 05:34:13NOT_FOUND - home node 'rabbit@rabbit-services-00' of durable queue 'cert_collect_info' in vhost 'certificate' is down or inaccessible
2021-08-21 05:34:13NOT_FOUND - home node 'rabbit@rabbit-services-00' of durable queue 'cert_collect_info' in vhost 'certificate' is down or inaccessible
2021-08-21 05:34:13NOT_FOUND - home node 'rabbit@rabbit-services-00' of durable queue 'cert_collect_info' in vhost 'certificate' is down or inaccessible
2021-08-21 05:34:13NOT_FOUND - home node 'rabbit@rabbit-services-00' of durable queue 'cert_collect_info' in vhost 'certificate' is down or inaccessible
在 RabbitMQ 仪表板上,我看到:
在仪表板的顶部,我看到:
这两个集群节点位于两个独立服务器(services-00 和 services-01)上的 docker 容器中。在报告问题的节点上,在容器内,我找到了 command rabbitmqctl cluster_status
。它给了我:
root@rabbit-services-00:/# rabbitmqctl cluster_status
Cluster status of node 'rabbit@rabbit-services-00' ...
[{nodes,[{disc,['rabbit@rabbit-services-00','rabbit@rabbit-services-01']}]},
{running_nodes,['rabbit@rabbit-services-00']},
{cluster_name,<<"rabbit@rabbit-services-01">>},
{partitions,[{'rabbit@rabbit-services-00',['rabbit@rabbit-services-01']}]},
{alarms,[{'rabbit@rabbit-services-00',[]}]}]
但我不确定如何解释它,至少在短时间内我必须尝试让它运行。任何帮助将不胜感激。从凌晨 1 点开始,我一直在努力解决这个问题,但我已经没有选择了。我想在配置中注释掉问题节点Nginx
会有所帮助,但它没有。当upstart
我重新启动时,服务不断死机。
解决方案
我将其追溯到一个开关信号(谢谢 Nagios)。我从容器中执行了以下操作:
docker exec rabbit-services-00 rabbitmqctl stop_app
docker exec rabbit-services-00 rabbitmqctl start_app
然后我重新检查了队列,他们又开始工作了。看来这只是一次短暂的网络中断。
推荐阅读
- reactjs - 如何仅为特定组件禁用浏览器后退按钮
- swift - 用户的 Swift 颜色选择器
- python - python unitets的执行顺序由他们的声明
- python - 如何从另一个 Docker 容器连接 Postgres Docker 映像
- google-chrome - 基于 Chromium 的浏览器拒绝 SSL 证书
- wordpress - 如何获取产品网址?
- powershell - 从特定 AD 组获取 AdUsers 和过滤结果
- javascript - Javascript条件如何确定从最小到最大的数字?
- javascript - How to read variable value sent over Http Post and Get on AWS API Gateway Lambda using Javascript(Node.js)?
- android - 添加 DrawerToggle 时如何设置 ImageView 中心?