首页 > 解决方案 > 当我关闭 PostgreSQL 时阻止 Kubernetes 重新启动容器

问题描述

我正在维护一个 Kubernetes 集群,其中包括两个不同 pod 中的两个 PostgreSQL 服务器,一个主节点和一个副本。副本通过日志传送从主同步。

故障导致日志传送开始失败,因此副本不再与主同步。

使副本与主副本重新同步的过程需要停止副本的 postgres 服务。这就是我遇到麻烦的地方。

一旦我关闭 postgres 服务,Kubernetes 似乎正在重新启动容器,该服务立即再次重新启动 postgres。我需要停止运行其中的 postgres 服务的容器,以允许我执行修复损坏的复制的后续步骤。

如何让 Kubernetes 允许我在不重新启动容器的情况下关闭 postgres 服务?

更多详细信息:

要停止副本,我通过 运行副本 pod 上的 shell kubectl exec -it <pod name> -- /bin/sh,然后pg_ctl stop从 shell 运行。我得到以下回复:

server shutting down
command terminated with exit code 137

我被踢出了壳。

当我运行时,kubectl describe pod我看到以下内容:

Name:         pgset-primary-1
Namespace:    qa
Priority:     0
Node:         aks-nodepool1-95718424-0/10.240.0.4
Start Time:   Fri, 09 Jul 2021 13:48:06 +1200
Labels:       app=pgset-primary
              controller-revision-hash=pgset-primary-6d7d65c8c7
              name=pgset-replica
              statefulset.kubernetes.io/pod-name=pgset-primary-1
Annotations:  <none>
Status:       Running
IP:           10.244.1.42
IPs:
  IP:           10.244.1.42
Controlled By:  StatefulSet/pgset-primary
Containers:
  pgset-primary:
    Container ID:   containerd://bc00b4904ab683d9495ad020328b5033ecb00d19c9e5b12d22de18f828918455
    Image:          *****/crunchy-postgres:centos7-9.6.8-1.6.0
    Image ID:       docker.io/*****/crunchy-postgres@sha256:2850e00f9a619ff4bb6ff889df9bcb2529524ca8110607e4a7d9e36d00879057
    Port:           5432/TCP
    Host Port:      0/TCP
    State:          Running
      Started:      Sat, 06 Nov 2021 18:29:34 +1300
    Last State:     Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Sat, 06 Nov 2021 18:28:09 +1300
      Finished:     Sat, 06 Nov 2021 18:29:18 +1300
    Ready:          True
    Restart Count:  6
    Limits:
      cpu:     250m
      memory:  512Mi
    Requests:
      cpu:     10m
      memory:  256Mi
    Environment:
      PGHOST:                 /tmp
      PG_PRIMARY_USER:        primaryuser
      PG_MODE:                set
      PG_PRIMARY_HOST:        pgset-primary
      PG_REPLICA_HOST:        pgset-replica
      PG_PRIMARY_PORT:        5432
      [...]
      ARCHIVE_TIMEOUT:        60
      MAX_WAL_KEEP_SEGMENTS:  400
    Mounts:
      /backrestrepo from backrestrepo (rw)
      /pgconf from pgbackrestconf (rw)
      /pgdata from pgdata (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from pgset-sa-token-nh6ng (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             True
  ContainersReady   True
  PodScheduled      True
Volumes:
  pgdata:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  pgdata-pgset-primary-1
    ReadOnly:   false
  backrestrepo:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  backrestrepo-pgset-primary-1
    ReadOnly:   false
  pgbackrestconf:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      pgbackrest-configmap
    Optional:  false
  pgset-sa-token-nh6ng:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  pgset-sa-token-nh6ng
    Optional:    false
QoS Class:       Burstable
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/memory-pressure:NoSchedule op=Exists
                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                 node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason   Age                 From     Message
  ----     ------   ----                ----     -------
  Warning  BackOff  88m (x3 over 3h1m)  kubelet  Back-off restarting failed container
  Normal   Pulled   88m (x7 over 120d)  kubelet  Container image "*****/crunchy-postgres:centos7-9.6.8-1.6.0" already present on machine
  Normal   Created  88m (x7 over 120d)  kubelet  Created container pgset-primary
  Normal   Started  88m (x7 over 120d)  kubelet  Started container pgset-primary

这些事件表明容器是由 Kubernetes 启动的。

pod 没有 liveness 或 readiness 探针,所以我不知道当我关闭其中运行的 postgres 服务时,什么会提示 Kubernetes 重新启动容器。

标签: postgresqlkubernetesdatabase-replicationpostgresql-9.6

解决方案


这是由于restartPolicy造成的。容器生命周期由于其进程已完成而终止。如果您不想创建新容器,则需要更改这些 pod 的重启策略。

如果此 pod 是部署的一部分,请查看kubectl explain deployment.spec.template.spec.restartPolicy


推荐阅读