elasticsearch - Elasticsearch 一个特定的分片在不同的数据节点中不断初始化
问题描述
我收到 ElasticsearchStatusWarning 说集群状态是黄色的。运行健康检查 API 后,我在下面看到
curl -X GET http://localhost:9200/_cluster/health/
{"cluster_name":"my-elasticsearch","status":"yellow","timed_out":false,"number_of_nodes":8,"number_of_data_nodes":3,"active_primary_shards":220,"active_shards":438,"relocating_shards":0,"initializing_shards":2,"unassigned_shards":0,"delayed_unassigned_shards":0,"number_of_pending_tasks":0,"number_of_in_flight_fetch":0,"task_max_waiting_in_queue_millis":0,"active_shards_percent_as_number":99.54545454545455}
initializing_shards 为 2。因此,我进一步运行以下调用
curl -X GET http://localhost:9200/_cat/shards?h=index,shard,prirep,state,unassigned.reason |grep INIT
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 33457 100 33457 0 graph_vertex_24_18549 0 r INITIALIZING ALLOCATION_FAILED
0 79609 0 --:--:-- --:--:-- --:--:-- 79659
curl -X GET http://localhost:9200/_cat/shards/graph_vertex_24_18549
graph_vertex_24_18549 0 p STARTED 8373375 8.4gb IP1 elasticsearch-data-1
graph_vertex_24_18549 0 r INITIALIZING IP2 elasticsearch-data-2
并在几分钟内重新运行相同的命令,现在显示它正在 elasticsearch-data-0 中初始化。见下文
graph_vertex_24_18549 0 p STARTED 8373375 8.4gb IP1 elasticsearch-data-1
graph_vertex_24_18549 0 r INITIALIZING IP0 elasticsearch-data-0
如果我在几分钟内再次重新运行它,我可以看到它再次在 elasticsearch-data-2 中被初始化。但它永远不会开始。
curl -X GET http://localhost:9200/_cat/allocation?v
shards disk.indices disk.used disk.avail disk.total disk.percent host ip node
147 162.2gb 183.8gb 308.1gb 492gb 37 IP1 IP1 elasticsearch-data-2
146 217.3gb 234.2gb 257.7gb 492gb 47 IP2 IP2 elasticsearch-data-1
147 216.6gb 231.2gb 260.7gb 492gb 47 IP3 IP3 elasticsearch-data-0
curl -X GET http://localhost:9200/_cat/nodes?v
ip heap.percent ram.percent cpu load_1m load_5m load_15m node.role master name
IP1 7 77 20 4.17 4.57 4.88 mi - elasticsearch-master-2
IP2 72 59 7 2.59 2.38 2.19 i - elasticsearch-5f4bd5b88f-4lvxz
IP3 57 49 3 0.75 1.13 1.09 di - elasticsearch-data-2
IP4 63 57 21 2.69 3.58 4.11 di - elasticsearch-data-0
IP5 5 59 7 2.59 2.38 2.19 mi - elasticsearch-master-0
IP6 69 53 13 4.67 4.60 4.66 di - elasticsearch-data-1
IP7 8 70 14 2.86 3.20 3.09 mi * elasticsearch-master-1
IP8 30 77 20 4.17 4.57 4.88 i - elasticsearch-5f4bd5b88f-wnrl4
curl -s -XGET http://localhost:9200/_cluster/allocation/explain -d '{ "index": "graph_vertex_24_18549", "shard": 0, "primary": false }' -H '内容类型:应用程序/json'
{"index":"graph_vertex_24_18549","shard":0,"primary":false,"current_state":"initializing","unassigned_info":{"reason":"ALLOCATION_FAILED","at":"2020-11-04T08:21:45.756Z","failed_allocation_attempts":1,"details":"failed shard on node [1XEXS92jTK-wwanNgQrxsA]: failed to perform indices:data/write/bulk[s] on replica [graph_vertex_24_18549][0], node[1XEXS92jTK-wwanNgQrxsA], [R], s[STARTED], a[id=RnTOlfQuQkOumVuw_NeuTw], failure RemoteTransportException[[elasticsearch-data-2][IP:9300][indices:data/write/bulk[s][r]]]; nested: CircuitBreakingException[[parent] Data too large, data for [<transport_request>] would be [4322682690/4gb], which is larger than the limit of [4005632409/3.7gb], real usage: [3646987112/3.3gb], new bytes reserved: [675695578/644.3mb]]; ","last_allocation_status":"no_attempt"},"current_node":{"id":"o_9jyrmOSca9T12J4bY0Nw","name":"elasticsearch-data-0","transport_address":"IP:9300"},"explanation":"the shard is in the process of initializing on node [elasticsearch-data-0], wait until initialization has completed"}
事情是由于与上述相同的异常,我早些时候收到了未分配碎片的警报 - “CircuitBreakingException [[parent] 数据太大,[<transport_request>] 的数据将是 [4322682690/4gb],大于[4005632409/3.7gb]"
但当时堆只有2G。我把它增加到4G。现在我看到了同样的错误,但这次是关于初始化分片而不是未分配分片。
我该如何补救?
解决方案
推荐阅读
- xml - XSD 节点“集”
- python - ModuleNotFoundError:尽管安装了 Pillow,但没有名为“PIL”的模块
- html - 如何在 html 请求中包含变量?
- java - Android studio 不识别 java 语句/JDK
- microsoft-graph-api - 是否有任何图形 API 来更新 msteams 中的用户状态?
- python - Matplotlib - 向散点图添加图例
- java - 计算数组中七个整数的出现次数
- html - 如何在 Jquery 中调用同一类的更改和单击事件?
- firebase - 有人如何在 Firebase 托管上的多个发行版本之间进行流量拆分
- sql - sql 小提琴表不存在