amazon-web-services - elastic/elasticsearch：在 AWS 集群上使用 volumeClaimTemplate 卡在挂起状态的 Pod

问题描述

我在 aws 上设置了 kops 集群，并且 helm 安装在集群中。

我正在尝试使用图表安装easltic/elasticsearch 。我需要修改我在下面创建的values.yml文件的默认卷大小。

# Allocate smaller chunks of memory per pod
resources:
    requests:
      cpu: "100m"
      memory: "512M"
    limits:
      cpu: "1000m"
      memory: "512M"


# Request smaller persistent volume
volumeClaimTemplate:
  accessModes: [ "ReadWriteOnce" ]
  storageClassName: default
  resources:
    requests:
      storage: 10Gi

我是这样安装的

helm install elasticsearch -n logging elastic/elasticsearch -f values.yml

安装成功，但现在 Pod 卡在挂起状态

[ec2-user@ip-my elastic-search]$ kubectl get pods -n logging
NAME                     READY   STATUS    RESTARTS   AGE
elasticsearch-master-0   0/1     Pending   0          6m35s
elasticsearch-master-1   0/1     Pending   0          6m35s
elasticsearch-master-2   0/1     Pending   0          6m35s

更新：

[ec2-user@my-ip elastic-search]$ kubectl describe pods -n logging elasticsearch-master-0

Name:           elasticsearch-master-0
Namespace:      logging
Priority:       0
Node:           <none>
Labels:         app=elasticsearch-master
                chart=elasticsearch
                controller-revision-hash=elasticsearch-master-697ffb4548
                release=elasticsearch
                statefulset.kubernetes.io/pod-name=elasticsearch-master-0
Annotations:    <none>
Status:         Pending
IP:
IPs:            <none>
Controlled By:  StatefulSet/elasticsearch-master
Init Containers:
  configure-sysctl:
    Image:      docker.elastic.co/elasticsearch/elasticsearch:7.12.0
    Port:       <none>
    Host Port:  <none>
    Command:
      sysctl
      -w
      vm.max_map_count=262144
    Environment:  <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-tmqp9 (ro)
Containers:
  elasticsearch:
    Image:       docker.elastic.co/elasticsearch/elasticsearch:7.12.0
    Ports:       9200/TCP, 9300/TCP
    Host Ports:  0/TCP, 0/TCP
    Limits:
      cpu:     1
      memory:  2Gi
    Requests:
      cpu:      1
      memory:   2Gi
    Readiness:  exec [sh -c #!/usr/bin/env bash -e
# If the node is starting up wait for the cluster to be ready (request params: "wait_for_status=green&timeout=1s" )
# Once it has started only check that the node itself is responding
START_FILE=/tmp/.es_start_file

# Disable nss cache to avoid filling dentry cache when calling curl
# This is required with Elasticsearch Docker using nss < 3.52
export NSS_SDB_USE_CACHE=no

http () {
  local path="${1}"
  local args="${2}"
  set -- -XGET -s

  if [ "$args" != "" ]; then
    set -- "$@" $args
  fi

  if [ -n "${ELASTIC_USERNAME}" ] && [ -n "${ELASTIC_PASSWORD}" ]; then
    set -- "$@" -u "${ELASTIC_USERNAME}:${ELASTIC_PASSWORD}"
  fi

  curl --output /dev/null -k "$@" "http://127.0.0.1:9200${path}"
}

if [ -f "${START_FILE}" ]; then
  echo 'Elasticsearch is already running, lets check the node is healthy'
  HTTP_CODE=$(http "/" "-w %{http_code}")
  RC=$?
  if [[ ${RC} -ne 0 ]]; then
    echo "curl --output /dev/null -k -XGET -s -w '%{http_code}' \${BASIC_AUTH} http://127.0.0.1:9200/ failed with RC ${RC}"
    exit ${RC}
  fi
  # ready if HTTP code 200, 503 is tolerable if ES version is 6.x
  if [[ ${HTTP_CODE} == "200" ]]; then
    exit 0
  elif [[ ${HTTP_CODE} == "503" && "7" == "6" ]]; then
    exit 0
  else
    echo "curl --output /dev/null -k -XGET -s -w '%{http_code}' \${BASIC_AUTH} http://127.0.0.1:9200/ failed with HTTP code ${HTTP_CODE}"
    exit 1
  fi

else
  echo 'Waiting for elasticsearch cluster to become ready (request params: "wait_for_status=green&timeout=1s" )'
  if http "/_cluster/health?wait_for_status=green&timeout=1s" "--fail" ; then
    touch ${START_FILE}
    exit 0
  else
    echo 'Cluster is not yet ready (request params: "wait_for_status=green&timeout=1s" )'
    exit 1
  fi
fi
] delay=10s timeout=5s period=10s #success=3 #failure=3
    Environment:
      node.name:                     elasticsearch-master-0 (v1:metadata.name)
      cluster.initial_master_nodes:  elasticsearch-master-0,elasticsearch-master-1,elasticsearch-master-2,
      discovery.seed_hosts:          elasticsearch-master-headless
      cluster.name:                  elasticsearch
      network.host:                  0.0.0.0
      ES_JAVA_OPTS:                  -Xmx1g -Xms1g
      node.data:                     true
      node.ingest:                   true
      node.master:                   true
      node.ml:                       true
      node.remote_cluster_client:    true
    Mounts:
      /usr/share/elasticsearch/data from elasticsearch-master (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-tmqp9 (ro)
Conditions:
  Type           Status
  PodScheduled   False
Volumes:
  elasticsearch-master:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  elasticsearch-master-elasticsearch-master-0
    ReadOnly:   false
  default-token-tmqp9:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-tmqp9
    Optional:    false
QoS Class:       Burstable
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                 node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason            Age                From               Message
  ----     ------            ----               ----               -------
  Warning  FailedScheduling  23s (x2 over 24s)  default-scheduler  0/3 nodes are available: 3 pod has unbound immediate PersistentVolumeClaims.

[ec2-user@my-ip elastic-search]$ kubectl describe statefulset -n logging elasticsearch-master
Name:               elasticsearch-master
Namespace:          logging
CreationTimestamp:  Mon, 19 Apr 2021 03:51:58 +0000
Selector:           app=elasticsearch-master
Labels:             app=elasticsearch-master
                    app.kubernetes.io/managed-by=Helm
                    chart=elasticsearch
                    heritage=Helm
                    release=elasticsearch
Annotations:        esMajorVersion: 7
                    meta.helm.sh/release-name: elasticsearch
                    meta.helm.sh/release-namespace: logging
Replicas:           3 desired | 3 total
Update Strategy:    RollingUpdate
Pods Status:        0 Running / 3 Waiting / 0 Succeeded / 0 Failed
Pod Template:
  Labels:  app=elasticsearch-master
           chart=elasticsearch
           release=elasticsearch
  Init Containers:
   configure-sysctl:
    Image:      docker.elastic.co/elasticsearch/elasticsearch:7.12.0
    Port:       <none>
    Host Port:  <none>
    Command:
      sysctl
      -w
      vm.max_map_count=262144
    Environment:  <none>
    Mounts:       <none>
  Containers:
   elasticsearch:
    Image:       docker.elastic.co/elasticsearch/elasticsearch:7.12.0
    Ports:       9200/TCP, 9300/TCP
    Host Ports:  0/TCP, 0/TCP
    Limits:
      cpu:     1
      memory:  2Gi
    Requests:
      cpu:      1
      memory:   2Gi
    Readiness:  exec [sh -c #!/usr/bin/env bash -e
# If the node is starting up wait for the cluster to be ready (request params: "wait_for_status=green&timeout=1s" )
# Once it has started only check that the node itself is responding
START_FILE=/tmp/.es_start_file

# Disable nss cache to avoid filling dentry cache when calling curl
# This is required with Elasticsearch Docker using nss < 3.52
export NSS_SDB_USE_CACHE=no

http () {
  local path="${1}"
  local args="${2}"
  set -- -XGET -s

  if [ "$args" != "" ]; then
    set -- "$@" $args
  fi

  if [ -n "${ELASTIC_USERNAME}" ] && [ -n "${ELASTIC_PASSWORD}" ]; then
    set -- "$@" -u "${ELASTIC_USERNAME}:${ELASTIC_PASSWORD}"
  fi

  curl --output /dev/null -k "$@" "http://127.0.0.1:9200${path}"
}

if [ -f "${START_FILE}" ]; then
  echo 'Elasticsearch is already running, lets check the node is healthy'
  HTTP_CODE=$(http "/" "-w %{http_code}")
  RC=$?
  if [[ ${RC} -ne 0 ]]; then
    echo "curl --output /dev/null -k -XGET -s -w '%{http_code}' \${BASIC_AUTH} http://127.0.0.1:9200/ failed with RC ${RC}"
    exit ${RC}
  fi
  # ready if HTTP code 200, 503 is tolerable if ES version is 6.x
  if [[ ${HTTP_CODE} == "200" ]]; then
    exit 0
  elif [[ ${HTTP_CODE} == "503" && "7" == "6" ]]; then
    exit 0
  else
    echo "curl --output /dev/null -k -XGET -s -w '%{http_code}' \${BASIC_AUTH} http://127.0.0.1:9200/ failed with HTTP code ${HTTP_CODE}"
    exit 1
  fi

else
  echo 'Waiting for elasticsearch cluster to become ready (request params: "wait_for_status=green&timeout=1s" )'
  if http "/_cluster/health?wait_for_status=green&timeout=1s" "--fail" ; then
    touch ${START_FILE}
    exit 0
  else
    echo 'Cluster is not yet ready (request params: "wait_for_status=green&timeout=1s" )'
    exit 1
  fi
fi
] delay=10s timeout=5s period=10s #success=3 #failure=3
    Environment:
      node.name:                      (v1:metadata.name)
      cluster.initial_master_nodes:  elasticsearch-master-0,elasticsearch-master-1,elasticsearch-master-2,
      discovery.seed_hosts:          elasticsearch-master-headless
      cluster.name:                  elasticsearch
      network.host:                  0.0.0.0
      ES_JAVA_OPTS:                  -Xmx1g -Xms1g
      node.data:                     true
      node.ingest:                   true
      node.master:                   true
      node.ml:                       true
      node.remote_cluster_client:    true
    Mounts:
      /usr/share/elasticsearch/data from elasticsearch-master (rw)
  Volumes:  <none>
Volume Claims:
  Name:          elasticsearch-master
  StorageClass:  standard
  Labels:        <none>
  Annotations:   <none>
  Capacity:      10Gi
  Access Modes:  [ReadWriteOnce]
Events:
  Type    Reason            Age    From                    Message
  ----    ------            ----   ----                    -------
  Normal  SuccessfulCreate  2m50s  statefulset-controller  create Pod elasticsearch-master-0 in StatefulSet elasticsearch-master successful
  Normal  SuccessfulCreate  2m50s  statefulset-controller  create Pod elasticsearch-master-1 in StatefulSet elasticsearch-master successful
  Normal  SuccessfulCreate  2m50s  statefulset-controller  create Pod elasticsearch-master-2 in StatefulSet elasticsearch-master successful

[ec2-user@~]$ kubectl describe pvc -n logging elasticsearch-master-0
Error from server (NotFound): persistentvolumeclaims "elasticsearch-master-0" not found
[ec2-user@ip~]$ kubectl get storageclass
NAME                      PROVISIONER             RECLAIMPOLICY   VOLUMEBINDINGMODE      ALLOWVOLUMEEXPANSION   AGE
default                   kubernetes.io/aws-ebs   Delete          Immediate              false                  38d
gp2                       kubernetes.io/aws-ebs   Delete          Immediate              false                  38d
kops-ssd-1-17 (default)   kubernetes.io/aws-ebs   Delete          WaitForFirstConsumer   true                   38d

标签： amazon-web-servicesdockerelasticsearchkubernetes

amazon-web-services - elastic/elasticsearch：在 AWS 集群上使用 volumeClaimTemplate 卡在挂起状态的 Pod

问题描述

解决方案

推荐阅读