首页 > 解决方案 > 在 Kubernetes 中,是否为每个容器或每个 pod 定义了健康检查?

问题描述

Google Cloud 博客中,他们说如果 Readiness 探测失败,那么流量将不会被路由到pod。如果 Liveliness 探测失败,则会重新启动一个pod 。

Kubernetes 文档他们说 kubelet 使用 Liveness 探针来了解容器是否需要重新启动。Readiness 探针用于检查容器是否准备好开始接受来自客户端的请求。

我目前的理解是,当一个 pod 的所有容器都准备好时,它就被认为是 Ready and Alive。这反过来意味着,如果 pod 中的 3 个容器中有 1 个发生故障,则整个 pod 将被视为失败(未就绪/未激活)。如果 3 个容器中有 1 个被重启,那么就意味着整个 pod 都被重启了。这个对吗?

标签: kuberneteskubernetes-health-check

解决方案


A Pod is ready only when all of its containers are ready. When a Pod is ready, it should be added to the load balancing pools of all matching Services because it means that this Pod is able to serve requests.
As you can see in the Readiness Probe documentation:

The kubelet uses readiness probes to know when a container is ready to start accepting traffic.

Using readiness probe can ensure that traffic does not reach a container that is not ready for it.
Using liveness probe can ensure that container is restarted when it fail ( the kubelet will kill and restart only the specific container).

Additionally, to answer your last question, I will use an example:

And if 1 out of 3 containers was restarted, then it means that the entire pod was restarted. Is this correct?

Let's have a simple Pod manifest file with livenessProbe for one container that always fails:

---
# web-app.yml
apiVersion: v1
kind: Pod
metadata:
  labels:
    run: web-app
  name: web-app
spec:
  containers:
  - image: nginx
    name: web

  - image: redis
    name: failed-container
    livenessProbe:
      httpGet:
        path: /healthz # I don't have this endpoint configured so it will always be failed.
        port: 8080

After creating web-app Pod and waiting some time, we can check how the livenessProbe works:

$ kubectl describe pod web-app
Name:         web-app
Namespace:    default
Containers:
  web:
    ...
    State:          Running
      Started:      Tue, 09 Mar 2021 09:56:59 +0000
    Ready:          True
    Restart Count:  0
    ...
  failed-container:
    ...
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Completed
      Exit Code:    0
    Ready:          False
    Restart Count:  7
    ...
Events:
  Type     Reason     Age                   From               Message
  ----     ------     ----                  ----               -------
  ...
  Normal   Killing    9m40s (x2 over 10m)   kubelet            Container failed-container failed liveness probe, will be restarted
  ...

As you can see, only the failed-container container was restarted (Restart Count: 7).

More information can be found in the Liveness, Readiness and Startup Probes documentation.


推荐阅读