elasticsearch - Elasticsearch 7.2 集群遇到未分配的分片

问题描述

我想用 7.2 版本搭建一个三节点的 Elasticsearch 集群，但出乎意料。

我有三个虚拟机：192.168.7.2、192.168.7.3、192.168.7.4，它们的主要配置在config/elasticsearch.yml：

192.168.7.2：

cluster.name: ucas
node.name: node-2
network.host: 192.168.7.2
http.port: 9200
discovery.seed_hosts: ["192.168.7.2", "192.168.7.3", "192.168.7.4"]
cluster.initial_master_nodes: ["node-2", "node-3", "node-4"]
http.cors.enabled: true
http.cors.allow-origin: "*"

192.168.7.3：

cluster.name: ucas
node.name: node-3
network.host: 192.168.7.3
http.port: 9200
discovery.seed_hosts: ["192.168.7.2", "192.168.7.3", "192.168.7.4"]
cluster.initial_master_nodes: ["node-2", "node-3", "node-4"]

192.168.7.4：

cluster.name: ucas
node.name: node-4
network.host: 192.168.7.4
http.port: 9200
discovery.seed_hosts: ["192.168.7.2", "192.168.7.3", "192.168.7.4"]
cluster.initial_master_nodes: ["node-2", "node-3", "node-4"]

当我启动每个节点时，创建一个名为movie的索引，其中包含3个分片和0个副本，然后将一些文档写入索引，集群看起来很正常：

PUT moive
{
  "settings": {
    "number_of_shards": 3,
    "number_of_replicas": 0
  }
}


PUT moive/_doc/3
{
  "title":"title 3"
}

然后，将movie副本设置为 1：

PUT moive/_settings
{
  "number_of_replicas": 1
}

一切顺利，但是当我将movie副本设置为 2 时：

PUT moive/_settings
{
  "number_of_replicas": 2
}

无法将新副本分配给 node2。

不知道哪一步不对，请大家帮忙讨论一下。

标签： elasticsearchcluster-computingsharding

解决方案

先用explain命令找出shard不能分配的原因：</p>


GET _cluster/allocation/explain?pretty



{
  "index" : "moive",
  "shard" : 2,
  "primary" : false,
  "current_state" : "unassigned",
  "unassigned_info" : {
    "reason" : "NODE_LEFT",
    "at" : "2019-07-19T06:47:29.704Z",
    "details" : "node_left [tIm8GrisRya8jl_n9lc3MQ]",
    "last_allocation_status" : "no_attempt"
  },
  "can_allocate" : "no",
  "allocate_explanation" : "cannot allocate because allocation is not permitted to any of the nodes",
  "node_allocation_decisions" : [
    {
      "node_id" : "kQ0Noq8LSpyEcVDF1POfJw",
      "node_name" : "node-3",
      "transport_address" : "192.168.7.3:9300",
      "node_attributes" : {
        "ml.machine_memory" : "5033172992",
        "ml.max_open_jobs" : "20",
        "xpack.installed" : "true"
      },
      "node_decision" : "no",
      "store" : {
        "matching_sync_id" : true
      },
      "deciders" : [
        {
          "decider" : "same_shard",
          "decision" : "NO",
          "explanation" : "the shard cannot be allocated to the same node on which a copy of the shard already exists [[moive][2], node[kQ0Noq8LSpyEcVDF1POfJw], [R], s[STARTED], a[id=Ul73SPyaTSyGah7Yl3k2zA]]"
        }
      ]
    },
    {
      "node_id" : "mNpqD9WPRrKsyntk2GKHMQ",
      "node_name" : "node-4",
      "transport_address" : "192.168.7.4:9300",
      "node_attributes" : {
        "ml.machine_memory" : "5033172992",
        "ml.max_open_jobs" : "20",
        "xpack.installed" : "true"
      },
      "node_decision" : "no",
      "store" : {
        "matching_sync_id" : true
      },
      "deciders" : [
        {
          "decider" : "same_shard",
          "decision" : "NO",
          "explanation" : "the shard cannot be allocated to the same node on which a copy of the shard already exists [[moive][2], node[mNpqD9WPRrKsyntk2GKHMQ], [P], s[STARTED], a[id=yQo1HUqoSdecD-SZyYMYfg]]"
        }
      ]
    },
    {
      "node_id" : "tIm8GrisRya8jl_n9lc3MQ",
      "node_name" : "node-2",
      "transport_address" : "192.168.7.2:9300",
      "node_attributes" : {
        "ml.machine_memory" : "5033172992",
        "ml.max_open_jobs" : "20",
        "xpack.installed" : "true"
      },
      "node_decision" : "no",
      "deciders" : [
        {
          "decider" : "disk_threshold",
          "decision" : "NO",
          "explanation" : "the node is above the low watermark cluster setting [cluster.routing.allocation.disk.watermark.low=85%], using more disk space than the maximum allowed [85.0%], actual free: [2.2790256709451573E-4%]"
        }
      ]
    }
  ]
}

我们可以看到node-2的磁盘空间已满：

[vagrant@node2 ~]$ df -h
Filesystem               Size  Used Avail Use% Mounted on
/dev/mapper/centos-root  8.4G  8.0G  480M  95% /
devtmpfs                 2.4G     0  2.4G   0% /dev
tmpfs                    2.4G     0  2.4G   0% /dev/shm
tmpfs                    2.4G  8.4M  2.4G   1% /run
tmpfs                    2.4G     0  2.4G   0% /sys/fs/cgroup
/dev/sda1                497M  118M  379M  24% /boot
none                     234G  149G   86G  64% /vagrant

然后我清理磁盘空间，一切恢复正常：

elasticsearch - Elasticsearch 7.2 集群遇到未分配的分片

问题描述

解决方案

推荐阅读