首页 > 解决方案 > Kibana / Elasticsearch:从 kibana 连接到 es 失败,没有可用的分片异常

问题描述

我正在尝试通过连接kibana到 es 集群(1 个主 - 1 个数据)。

Kibana 前端504报错。

在我的 kibana 日志中没有错误。

但是在es中:

[2019-02-22T11:39:33,764][WARN ][r.suppressed             ] path: /.kibana/doc/config%3A6.4.2/_update, params: {refresh=wait_for, index=.kibana, id=config:6.4.2, type=doc}
org.elasticsearch.action.UnavailableShardsException: [.kibana][0] [1] shardIt, [0] active : Timeout waiting for [1m], request: indices:data/write/update
    at org.elasticsearch.action.support.single.instance.TransportInstanceSingleOperationAction$AsyncSingleAction.retry(TransportInstanceSingleOperationAction.java:211) [elasticsearch-6.4.2.jar:6.4.2]
    at org.elasticsearch.action.support.single.instance.TransportInstanceSingleOperationAction$AsyncSingleAction.doStart(TransportInstanceSingleOperationAction.java:166) [elasticsearch-6.4.2.jar:6.4.2]
    at org.elasticsearch.action.support.single.instance.TransportInstanceSingleOperationAction$AsyncSingleAction$2.onTimeout(TransportInstanceSingleOperationAction.java:232) [elasticsearch-6.4.2.jar:6.4.2]
    at org.elasticsearch.cluster.ClusterStateObserver$ContextPreservingListener.onTimeout(ClusterStateObserver.java:317) [elasticsearch-6.4.2.jar:6.4.2]
    at org.elasticsearch.cluster.ClusterStateObserver$ObserverClusterStateListener.onTimeout(ClusterStateObserver.java:244) [elasticsearch-6.4.2.jar:6.4.2]
    at org.elasticsearch.cluster.service.ClusterApplierService$NotifyTimeout.run(ClusterApplierService.java:573) [elasticsearch-6.4.2.jar:6.4.2]
    at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:624) [elasticsearch-6.4.2.jar:6.4.2]
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_191]
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_191]
    at java.lang.Thread.run(Thread.java:748) [?:1.8.0_191]
[2019-02-22T11:39:33,768][WARN ][r.suppressed             ] path: /.kibana/doc/config%3A6.4.2/_update, params: {refresh=wait_for, index=.kibana, id=config:6.4.2, type=doc}
org.elasticsearch.action.UnavailableShardsException: [.kibana][0] [1] shardIt, [0] active : Timeout waiting for [1m], request: indices:data/write/update
    at org.elasticsearch.action.support.single.instance.TransportInstanceSingleOperationAction$AsyncSingleAction.retry(TransportInstanceSingleOperationAction.java:211) [elasticsearch-6.4.2.jar:6.4.2]
    at org.elasticsearch.action.support.single.instance.TransportInstanceSingleOperationAction$AsyncSingleAction.doStart(TransportInstanceSingleOperationAction.java:166) [elasticsearch-6.4.2.jar:6.4.2]
    at org.elasticsearch.action.support.single.instance.TransportInstanceSingleOperationAction$AsyncSingleAction$2.onTimeout(TransportInstanceSingleOperationAction.java:232) [elasticsearch-6.4.2.jar:6.4.2]
    at org.elasticsearch.cluster.ClusterStateObserver$ContextPreservingListener.onTimeout(ClusterStateObserver.java:317) [elasticsearch-6.4.2.jar:6.4.2]
    at org.elasticsearch.cluster.ClusterStateObserver$ObserverClusterStateListener.onTimeout(ClusterStateObserver.java:244) [elasticsearch-6.4.2.jar:6.4.2]
    at org.elasticsearch.cluster.service.ClusterApplierService$NotifyTimeout.run(ClusterApplierService.java:573) [elasticsearch-6.4.2.jar:6.4.2]
    at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:624) [elasticsearch-6.4.2.jar:6.4.2]
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_191]
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_191]
    at java.lang.Thread.run(Thread.java:748) [?:1.8.0_191]

我试图删除.kibana索引并重新启动所有服务,但没有成功。

curl -XGET http://master01-elastic:9200
{
  "name" : "master01",
  "cluster_name" : "local-stg-cluster",
  "cluster_uuid" : "K3zb-E6xRle7MWjYrag4nA",
  "version" : {
    "number" : "6.4.2",
    "build_flavor" : "default",
    "build_type" : "rpm",
    "build_hash" : "04711c2",
    "build_date" : "2018-09-26T13:34:09.098244Z",
    "build_snapshot" : false,
    "lucene_version" : "7.4.0",
    "minimum_wire_compatibility_version" : "5.6.0",
    "minimum_index_compatibility_version" : "5.0.0"
  },
  "tagline" : "You Know, for Search"
}


curl -XGET http://master01-elastic/_cat/allocation?v
shards disk.indices disk.used disk.avail disk.total disk.percent host ip node
     2                                                                   UNASSIGNED

 curl -XGET http://master01-elastic.dev.encode.local:9200/_cat/indices?v
health status index   uuid                   pri rep docs.count docs.deleted store.size pri.store.size
yellow open   .kibana K3NoYmDaRnGk9vem8oUFlQ   1   1   


curl -XGET http://master01-elastic:9200/_cat/shards?v
index   shard prirep state      docs store ip node
.kibana 0     p      UNASSIGNED               
.kibana 0     r      UNASSIGNED  


curl -XGET 'http://master01-elastic.dev.encode.local:9200/_recovery?human&detailed=true&active_only=true'
{}

$ curl -XGET 'http://master01-elastic.dev.encode.local:9200/_cluster/allocation/explain'
{"index":".kibana","shard":0,"primary":true,"current_state":"unassigned","unassigned_info":{"reason":"INDEX_CREATED","at":"2019-02-22T16:36:40.852Z","last_allocation_status":"no_attempt"},"can_allocate":"no","allocate_explanation":"cannot allocate because allocation is not permitted to any of the nodes"}

标签: elasticsearchkibana

解决方案


在对此类问题(即未分配的分片)进行故障排除时,首先通过运行以下命令有助于了解在节点恢复期间分配是否由于某种原因而失败:

curl -XGET 'localhost:9200/_recovery?human&detailed=true&active_only=true'

在您的情况下,响应为空,这意味着它不是恢复问题。

有时,如果分片分配失败的次数过多,它将保持未分配状态,直到您运行以下命令:

curl -XPOST http://master01-elastic/_cluster/reroute?retry_failed=true

如果这没有帮助,下一步包括检查分配决策以查看是否有任何问题,通过运行以下命令:

curl -XGET http://master01-elastic/_cluster/allocation/explain

在您的情况下,这会产生:

{
  "index": ".kibana",
  "shard": 0,
  "primary": true,
  "current_state": "unassigned",
  "unassigned_info": {
    "reason": "INDEX_CREATED",
    "at": "2019-02-22T16:36:40.852Z",
    "last_allocation_status": "no_attempt"
  },
  "can_allocate": "no",
  "allocate_explanation": "cannot allocate because allocation is not permitted to any of the nodes"
}

如果您的数据节点已关闭,或者您有一些集群级别或索引级别的分片分配过滤规则(例如阻止给定索引的分片分配给给定节点),则可能会出现这种情况。您可以通过检查集群的设置和索引来查看是否是这种情况。

curl -XGET http://master01-elastic/.kibana/_settings
curl -XGET http://master01-elastic/_cluster/settings

检查您在该部分中是否有某些index.routing.allocation.*内容(对于索引级规则)...

"settings": {
  "index": {
    "routing": {
      "allocation": {
        "include": {
          "_name": "NODE1,NODE2"
        },
        "exclude": {                        <--- this might be the issue
          "_name": "NODE3,NODE4"
        }
      }
    },

...或在cluster.routing.allocation.*部分(对于集群级规则

"cluster": {
  "routing": {
    "allocation": {
      "enable": "none"                      <--- this might be the issue
    }
  }

如果是这种情况,那么您可能必须调整您的规则。


推荐阅读