首页 > 解决方案 > Kubernetes - 守护进程错误原因

问题描述

有一个 k8s 守护程序集,它应该设置sysctl -w vm.max_map_count=262144在部署 pod 的主机节点上。daemonset 在第一次应用资源时按预期工作,但是,如果运行 daemonsets 的 k8s 节点稍后重新启动,daemonset pod 不会将主机的更新vm.max_map_count262144. dspod 进入运行状态,但在描述时,它们显示:

State:          Running
  Started:      Thu, 21 Jun 2018 12:01:51 +0100
Last State:     Terminated
  Reason:       Error
  Exit Code:    143

但是我无法弄清楚错误的原因,我不知道在哪里寻找以解决问题?

守护进程yaml:

kind: DaemonSet
apiVersion: extensions/v1beta1
metadata:
  name: ds-elk
  labels:
    app: elk
spec:
  template:
    metadata:
      labels:
        app: elk
    spec:
      hostPID: true
      containers:
        - name: startup-script
          image: gcr.io/google-containers/startup-script:v1
          imagePullPolicy: Always
          securityContext:
            privileged: true
          env:
          - name: STARTUP_SCRIPT
            value: |
              #! /bin/bash
              sysctl -w vm.max_map_count=262144
              echo done

主机是Red Hat EL 7.4. Kubernetes 服务器版本1.8.6

kubectl describe pod ds-elk-5z5hs输出:

Name:           ds-elk-5z5hs
Namespace:      default
Node:           xxx-00-xxxx-01v.devxxx.xxxxxx.xx.xx/xx.xxx.xx.xx
Start Time:     Tue, 15 May 2018 14:03:14 +0100
Labels:         app=elk
                controller-revision-hash=2068481183
                pod-template-generation=1
Annotations:    kubernetes.io/created-by={"kind":"SerializedReference","apiVersion":"v1","reference":{"kind":"DaemonSet","namespace":"default","name":"ds-elk","uid":"54372241-5840-11e8-aaaa-005056b97218","apiVersion"...
Status:         Running
IP:             xx.xxx.x.xxx
Controlled By:  DaemonSet/ds-elk
Containers:
  startup-script:
    Container ID:   docker://eff849b842ed7b28dcf07578301a12068c998cb42b59a88b2bf2e8243b72f419
    Image:          gcr.io/google-containers/startup-script:v1
    Image ID:       docker-pullable://gcr.io/google-containers/startup-script@sha256:be96df6845a2af0eb61b17817ed085ce41048e4044c541da7580570b61beff3e
    Port:           <none>
    State:          Running
      Started:      Thu, 21 Jun 2018 11:40:50 +0100
    Last State:     Terminated
      Reason:       Error
      Exit Code:    143
      Started:      Thu, 21 Jun 2018 07:24:56 +0100
      Finished:     Thu, 21 Jun 2018 11:39:22 +0100
    Ready:          True
    Restart Count:  2
    Environment:
      STARTUP_SCRIPT:  #! /bin/bash
sysctl -w vm.max_map_count=262144
echo done

    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-ld98j (ro)
Conditions:
  Type           Status
  Initialized    True
  Ready          True
  PodScheduled   True
Volumes:
  default-token-ld98j:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-ld98j
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.alpha.kubernetes.io/notReady:NoExecute
                 node.alpha.kubernetes.io/unreachable:NoExecute
                 node.kubernetes.io/disk-pressure:NoSchedule
                 node.kubernetes.io/memory-pressure:NoSchedule
Events:          <none>

标签: kuberneteselastic-stack

解决方案


最终完全摆脱了它daemonset,而是vm.max_map_count在 pod 的initContainers规范中设置:

  initContainers:
  - name: "sysctl"
    image: "busybox"
    imagePullPolicy: "Always"
    command: ["sysctl", "-w", "vm.max_map_count=262144"]
    securityContext:
      privileged: true

推荐阅读