首页 > 解决方案 > 为什么 ES 显示错误日志 `readiness probe failed`?

问题描述

我正在 AWS EKS 上部署 Elasticsearch 集群。下面是 k8s 规范 yml 文件。

apiVersion: elasticsearch.k8s.elastic.co/v1
kind: Elasticsearch
metadata:
  name: datasource
spec:
  version: 7.14.0
  nodeSets:
  - name: node
    count: 3
    config:
      node.store.allow_mmap: true
      xpack.security.http.ssl.enabled: false
      xpack.security.transport.ssl.enabled: false
      xpack.security.enabled: false
    podTemplate:
      spec:
        initContainers:
        - name: sysctl
          securityContext:
            privileged: true
          command: ['sh', '-c', 'sysctl -w vm.max_map_count=262144']
        containers:
        - name: elasticsearch
          readinessProbe:
              exec:
                command:
                - bash
                - -c
                - /mnt/elastic-internal/scripts/readiness-probe-script.sh
              failureThreshold: 3
              initialDelaySeconds: 10
              periodSeconds: 12
              successThreshold: 1
              timeoutSeconds: 12
          env:
          - name: READINESS_PROBE_TIMEOUT
            value: "30"
    volumeClaimTemplates:
    - metadata:
        name: elasticsearch-data
      spec:
        accessModes:
          - ReadWriteOnce
        storageClassName: ebs-sc
        resources:
          requests:
            storage: 1024Gi

部署后,我看到所有三个 pod 都有错误:

{"type": "server", "timestamp": "2021-10-05T05:19:37,041Z", "level": "INFO", "component": "o.e.c.m.MetadataMappingService", "cluster.name": "datasource", "node.name": "datasource-es-node-0", "message": "[.kibana/g5_90XpHSI-y-I7MJfBZhQ] update_mapping [_doc]", "cluster.uuid": "xJ00drroT_CbJPfzi8jSAg", "node.id": "qmtgUZHbR4aTWsYaoIEDEA"  }
{"type": "server", "timestamp": "2021-10-05T05:19:37,622Z", "level": "INFO", "component": "o.e.c.r.a.AllocationService", "cluster.name": "datasource", "node.name": "datasource-es-node-0", "message": "Cluster health status changed from [YELLOW] to [GREEN] (reason: [shards started [[.kibana][0]]]).", "cluster.uuid": "xJ00drroT_CbJPfzi8jSAg", "node.id": "qmtgUZHbR4aTWsYaoIEDEA"  }
{"timestamp": "2021-10-05T05:19:40+00:00", "message": "readiness probe failed", "curl_rc": "35"}
{"timestamp": "2021-10-05T05:19:45+00:00", "message": "readiness probe failed", "curl_rc": "35"}
{"timestamp": "2021-10-05T05:19:50+00:00", "message": "readiness probe failed", "curl_rc": "35"}
{"timestamp": "2021-10-05T05:19:55+00:00", "message": "readiness probe failed", "curl_rc": "35"}
{"timestamp": "2021-10-05T05:20:00+00:00", "message": "readiness probe failed", "curl_rc": "35"}
{"timestamp": "2021-10-05T05:20:05+00:00", "message": "readiness probe failed", "curl_rc": "35"}
{"timestamp": "2021-10-05T05:20:10+00:00", "message": "readiness probe failed", "curl_rc": "35"}
{"timestamp": "2021-10-05T05:20:15+00:00", "message": "readiness probe failed", "curl_rc": "35"}

从上面的日志中,它Cluster health status changed from [YELLOW] to [GREEN] 首先显示然后出现此错误readiness probe failed。我想知道如何解决这个问题。是与 Elasticsearch 相关的错误还是与 k8s 相关?

标签: elasticsearchkubernetes

解决方案


您可以像这样在规范中声明 READINESS_PROBE_TIMEOUT 。

...
env:
- name: READINESS_PROBE_TIMEOUT
  value: "30"

如有必要,您可以自定义就绪探针,最新elasticsearch.k8s.elastic.co/v1 API规范在这里,它与您可以在规范中使用的 K8s PodTemplateSpec 相同Elasticsearch

更新: curl 错误代码 35 是指 SSL 错误。这是关于脚本的帖子。您可以从规范中删除以下设置并重新运行:

xpack.security.http.ssl.enabled: false
xpack.security.transport.ssl.enabled: false
xpack.security.enabled: false

推荐阅读