首页 > 解决方案 > Kubernetes 中的 Gluster 集群:节点重启后 Glusterd 不活动(死)。如何调试?

问题描述

我不知道该怎么做才能调试它。我有 1 个 Kubernetes 主节点和三个从节点。我已经使用本指南在三个节点上部署了一个 Gluster 集群https://github.com/gluster/gluster-kubernetes/blob/master/docs/setup-guide.md

我创建了卷,一切正常。但是当我重新启动一个从节点,并且该节点重新连接到主节点时,glusterd.service从节点内部显示为死,此后没有任何工作。

[root@kubernetes-node-1 /]# systemctl status glusterd.service
● glusterd.service - GlusterFS, a clustered file-system server
   Loaded: loaded (/usr/lib/systemd/system/glusterd.service; enabled; vendor preset: disabled)
   Active: inactive (dead)

我不知道从这里做什么,例如 /var/log/glusterfs/glusterd.log上次更新是 3 天前(在重新启动或 pod 删除+重新创建后没有更新错误)。

我只想知道在哪里glusterd崩溃,以便找出原因。

如何调试此崩溃?

所有节点(主 + 从)运行在Ubuntu Desktop 18 64 bit LTS Virtualbox VMs.

请求的日志 ( kubectl get all --all-namespaces):

NAMESPACE     NAME                                                 READY   STATUS              RESTARTS   AGE
glusterfs     pod/glusterfs-7nl8l                                  0/1     Running             62         22h
glusterfs     pod/glusterfs-wjnzx                                  1/1     Running             62         2d21h
glusterfs     pod/glusterfs-wl4lx                                  1/1     Running             112        41h
glusterfs     pod/heketi-7495cdc5fd-hc42h                          1/1     Running             0          22h
kube-system   pod/coredns-86c58d9df4-n2hpk                         1/1     Running             0          6d12h
kube-system   pod/coredns-86c58d9df4-rbwjq                         1/1     Running             0          6d12h
kube-system   pod/etcd-kubernetes-master-work                      1/1     Running             0          6d12h
kube-system   pod/kube-apiserver-kubernetes-master-work            1/1     Running             0          6d12h
kube-system   pod/kube-controller-manager-kubernetes-master-work   1/1     Running             0          6d12h
kube-system   pod/kube-flannel-ds-amd64-785q8                      1/1     Running             5          3d19h
kube-system   pod/kube-flannel-ds-amd64-8sj2z                      1/1     Running             8          3d19h
kube-system   pod/kube-flannel-ds-amd64-v62xb                      1/1     Running             0          3d21h
kube-system   pod/kube-flannel-ds-amd64-wx4jl                      1/1     Running             7          3d21h
kube-system   pod/kube-proxy-7f6d9                                 1/1     Running             5          3d19h
kube-system   pod/kube-proxy-7sf9d                                 1/1     Running             0          6d12h
kube-system   pod/kube-proxy-n9qxq                                 1/1     Running             8          3d19h
kube-system   pod/kube-proxy-rwghw                                 1/1     Running             7          3d21h
kube-system   pod/kube-scheduler-kubernetes-master-work            1/1     Running             0          6d12h

NAMESPACE     NAME                                                             TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)         AGE
default       service/kubernetes                                               ClusterIP   10.96.0.1        <none>        443/TCP         6d12h
elastic       service/glusterfs-dynamic-9ad03769-2bb5-11e9-8710-0800276a5a8e   ClusterIP   10.98.38.157     <none>        1/TCP           2d19h
elastic       service/glusterfs-dynamic-a77e02ca-2bb4-11e9-8710-0800276a5a8e   ClusterIP   10.97.203.225    <none>        1/TCP           2d19h
elastic       service/glusterfs-dynamic-ad16ed0b-2bb6-11e9-8710-0800276a5a8e   ClusterIP   10.105.149.142   <none>        1/TCP           2d19h
glusterfs     service/heketi                                                   ClusterIP   10.101.79.224    <none>        8080/TCP        2d20h
glusterfs     service/heketi-storage-endpoints                                 ClusterIP   10.99.199.190    <none>        1/TCP           2d20h
kube-system   service/kube-dns                                                 ClusterIP   10.96.0.10       <none>        53/UDP,53/TCP   6d12h

NAMESPACE     NAME                                     DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR                     AGE
glusterfs     daemonset.apps/glusterfs                 3         3         0       3            0           storagenode=glusterfs             2d21h
kube-system   daemonset.apps/kube-flannel-ds-amd64     4         4         4       4            4           beta.kubernetes.io/arch=amd64     3d21h
kube-system   daemonset.apps/kube-flannel-ds-arm       0         0         0       0            0           beta.kubernetes.io/arch=arm       3d21h
kube-system   daemonset.apps/kube-flannel-ds-arm64     0         0         0       0            0           beta.kubernetes.io/arch=arm64     3d21h
kube-system   daemonset.apps/kube-flannel-ds-ppc64le   0         0         0       0            0           beta.kubernetes.io/arch=ppc64le   3d21h
kube-system   daemonset.apps/kube-flannel-ds-s390x     0         0         0       0            0           beta.kubernetes.io/arch=s390x     3d21h
kube-system   daemonset.apps/kube-proxy                4         4         4       4            4           <none>                            6d12h

NAMESPACE     NAME                      READY   UP-TO-DATE   AVAILABLE   AGE
glusterfs     deployment.apps/heketi    1/1     1            0           2d20h
kube-system   deployment.apps/coredns   2/2     2            2           6d12h

NAMESPACE     NAME                                 DESIRED   CURRENT   READY   AGE
glusterfs     replicaset.apps/heketi-7495cdc5fd    1         1         0       2d20h
kube-system   replicaset.apps/coredns-86c58d9df4   2         2         2       6d12h

要求:

tasos@kubernetes-master-work:~$ kubectl logs -n glusterfs glusterfs-7nl8l
env variable is set. Update in gluster-blockd.service

标签: kubernetesubuntu-18.04glusterfs

解决方案


请检查这些类似的主题:

k8s 集群上的 GlusterFS 部署——就绪探测失败:/usr/local/bin/status-probe.sh

https://github.com/gluster/gluster-kubernetes/issues/539

检查tcmu-runner.log日志以调试它。

更新:

我认为这将是您的问题: https ://github.com/gluster/gluster-kubernetes/pull/557

PR 已准备好,但未合并。

更新 2:

https://github.com/gluster/glusterfs/issues/417

确保rpcbind已安装。


推荐阅读