首页 > 解决方案 > 我做了一个 Elasticsearch 集群,我有 3 台 azure VM 机器,只有一个节点是 Master。一段时间后集群不断崩溃

问题描述

当您检查集群运行状况时,它会显示节点数为 3,然后过一段时间它会变成只有一个,过一段时间又会恢复到 3。

(此外,我的第二个数据节点上也出现错误“无法作为悬空索引导入,因为索引名称已存在于集群元数据中” - 不确定这是否与问题或单独问题有关)

我的 ES 日志中出现以下错误:

[2020-10-10T10:34:02,441][WARN ][o.e.d.z.PublishClusterStateAction] [company_master] publishing cluster state with version [99] failed for the following nodes: [[{company_node1}{ilBojvb2QNOFkCxQsz22GQ}{oXPhSf5gT6mQjMMaF2v99Q}{10.10.2.9}{10.10.2.9:9300}{ml.machine_memory=8589463552, ml.max_open_jobs=20, xpack.installed=true, ml.enabled=true}]]
[2020-10-10T10:34:02,443][INFO ][o.e.c.s.ClusterApplierService] [company_master] removed {{company_node2}{4OoGXUfnReOuM-cQkJEhkw}{zlQ36kBfRzm7irMnUYnoOw}{10.10.2.7}{10.10.2.7:9300}{ml.machine_memory=4294496256, ml.max_open_jobs=20, xpack.installed=true, ml.enabled=true},}, reason: apply cluster state (from master [master {company_master}{kwsp-zEESeGYcdF55HGXzQ}{ZZjAUs11RwWksQCBfh8Xpw}{10.10.2.4}{10.10.2.4:9300}{ml.machine_memory=8589463552, xpack.installed=true, ml.max_open_jobs=20, ml.enabled=true} committed version [99] source [zen-disco-node-failed({company_node2}{4OoGXUfnReOuM-cQkJEhkw}{zlQ36kBfRzm7irMnUYnoOw}{10.10.2.7}{10.10.2.7:9300}{ml.machine_memory=4294496256, ml.max_open_jobs=20, xpack.installed=true, ml.enabled=true}), reason(transport disconnected)[{company_node2}{4OoGXUfnReOuM-cQkJEhkw}{zlQ36kBfRzm7irMnUYnoOw}{10.10.2.7}{10.10.2.7:9300}{ml.machine_memory=4294496256, ml.max_open_jobs=20, xpack.installed=true, ml.enabled=true} transport disconnected]]])
[2020-10-10T10:34:02,436][WARN ][o.e.g.G.InternalPrimaryShardAllocator] [company_master] [telemetry-2020.09.25][0]: failed to list shard for shard_started on node [ilBojvb2QNOFkCxQsz22GQ]
org.elasticsearch.action.FailedNodeException: Failed node [ilBojvb2QNOFkCxQsz22GQ]
        at org.elasticsearch.action.support.nodes.TransportNodesAction$AsyncAction.onFailure(TransportNodesAction.java:236) ~[elasticsearch-6.8.12.jar:6.8.12]
        at org.elasticsearch.action.support.nodes.TransportNodesAction$AsyncAction.access$200(TransportNodesAction.java:151) ~[elasticsearch-6.8.12.jar:6.8.12]
        at org.elasticsearch.action.support.nodes.TransportNodesAction$AsyncAction$1.handleException(TransportNodesAction.java:210) ~[elasticsearch-6.8.12.jar:6.8.12]
        at org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:534) ~[elasticsearch-6.8.12.jar:6.8.12]
        at org.elasticsearch.action.support.nodes.TransportNodesAction$AsyncAction.start(TransportNodesAction.java:194) ~[elasticsearch-6.8.12.jar:6.8.12]
        at org.elasticsearch.action.support.nodes.TransportNodesAction.doExecute(TransportNodesAction.java:91) ~[elasticsearch-6.8.12.jar:6.8.12]
        at org.elasticsearch.action.support.nodes.TransportNodesAction.doExecute(TransportNodesAction.java:54) ~[elasticsearch-6.8.12.jar:6.8.12]
        at org.elasticsearch.action.support.TransportAction$RequestFilterChain.proceed(TransportAction.java:167) ~[elasticsearch-6.8.12.jar:6.8.12]
        at org.elasticsearch.xpack.security.action.filter.SecurityActionFilter.apply(SecurityActionFilter.java:121) ~[?:?]
        at org.elasticsearch.action.support.TransportAction$RequestFilterChain.proceed(TransportAction.java:165) ~[elasticsearch-6.8.12.jar:6.8.12]
        at org.elasticsearch.action.support.TransportAction.execute(TransportAction.java:139) ~[elasticsearch-6.8.12.jar:6.8.12]
        at org.elasticsearch.action.support.TransportAction.execute(TransportAction.java:81) ~[elasticsearch-6.8.12.jar:6.8.12]
        at org.elasticsearch.gateway.TransportNodesListGatewayStartedShards.list(TransportNodesListGatewayStartedShards.java:95) ~[elasticsearch-6.8.12.jar:6.8.12]
        at org.elasticsearch.gateway.AsyncShardFetch.asyncFetch(AsyncShardFetch.java:283) ~[elasticsearch-6.8.12.jar:6.8.12]
        at org.elasticsearch.gateway.AsyncShardFetch.fetchData(AsyncShardFetch.java:126) ~[elasticsearch-6.8.12.jar:6.8.12]
        at org.elasticsearch.gateway.GatewayAllocator$InternalPrimaryShardAllocator.fetchData(GatewayAllocator.java:170) ~[elasticsearch-6.8.12.jar:6.8.12]
        at org.elasticsearch.gateway.PrimaryShardAllocator.makeAllocationDecision(PrimaryShardAllocator.java:86) ~[elasticsearch-6.8.12.jar:6.8.12]
        at org.elasticsearch.gateway.BaseGatewayShardAllocator.allocateUnassigned(BaseGatewayShardAllocator.java:59) ~[elasticsearch-6.8.12.jar:6.8.12]
        at org.elasticsearch.gateway.GatewayAllocator.innerAllocatedUnassigned(GatewayAllocator.java:125) ~[elasticsearch-6.8.12.jar:6.8.12]
        at org.elasticsearch.gateway.GatewayAllocator.allocateUnassigned(GatewayAllocator.java:115) ~[elasticsearch-6.8.12.jar:6.8.12]
        at org.elasticsearch.cluster.routing.allocation.AllocationService.reroute(AllocationService.java:410) ~[elasticsearch-6.8.12.jar:6.8.12]
        at org.elasticsearch.cluster.routing.allocation.AllocationService.reroute(AllocationService.java:378) ~[elasticsearch-6.8.12.jar:6.8.12]
        at org.elasticsearch.cluster.routing.allocation.AllocationService.reroute(AllocationService.java:361) ~[elasticsearch-6.8.12.jar:6.8.12]
        at org.elasticsearch.cluster.routing.allocation.AllocationService.disassociateDeadNodes(AllocationService.java:233) ~[elasticsearch-6.8.12.jar:6.8.12]
        at org.elasticsearch.discovery.zen.ZenDiscovery$NodeRemovalClusterStateTaskExecutor.execute(ZenDiscovery.java:636) ~[elasticsearch-6.8.12.jar:6.8.12]
        at org.elasticsearch.cluster.service.MasterService.executeTasks(MasterService.java:643) ~[elasticsearch-6.8.12.jar:6.8.12]
        at org.elasticsearch.cluster.service.MasterService.calculateTaskOutputs(MasterService.java:270) ~[elasticsearch-6.8.12.jar:6.8.12]
        at org.elasticsearch.cluster.service.MasterService.runTasks(MasterService.java:200) ~[elasticsearch-6.8.12.jar:6.8.12]
        at org.elasticsearch.cluster.service.MasterService$Batcher.run(MasterService.java:135) ~[elasticsearch-6.8.12.jar:6.8.12]
        at org.elasticsearch.cluster.service.TaskBatcher.runIfNotProcessed(TaskBatcher.java:150) ~[elasticsearch-6.8.12.jar:6.8.12]
        at org.elasticsearch.cluster.service.TaskBatcher$BatchedTask.run(TaskBatcher.java:188) ~[elasticsearch-6.8.12.jar:6.8.12]
        at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:681) ~[elasticsearch-6.8.12.jar:6.8.12]
        at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:252) ~[elasticsearch-6.8.12.jar:6.8.12]
        at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:215) ~[elasticsearch-6.8.12.jar:6.8.12]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
        at java.lang.Thread.run(Thread.java:834) [?:?]
Caused by: org.elasticsearch.transport.NodeNotConnectedException: [company_node1][10.10.2.9:9300] Node not connected
        at org.elasticsearch.transport.ConnectionManager.getConnection(ConnectionManager.java:151) ~[elasticsearch-6.8.12.jar:6.8.12]
        at org.elasticsearch.transport.TransportService.getConnection(TransportService.java:576) ~[elasticsearch-6.8.12.jar:6.8.12]
        at org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:531) ~[elasticsearch-6.8.12.jar:6.8.12]
        ... 33 more
[2020-10-10T10:34:02,462][WARN ][o.e.g.G.InternalPrimaryShardAllocator] [company_master] [telemetry-2020.09.13][0]: failed to list shard for shard_started on node [ilBojvb2QNOFkCxQsz22GQ]
org.elasticsearch.action.FailedNodeException: Failed node [ilBojvb2QNOFkCxQsz22GQ]
        at org.elasticsearch.action.support.nodes.TransportNodesAction$AsyncAction.onFailure(TransportNodesAction.java:236) ~[elasticsearch-6.8.12.jar:6.8.12]
        at org.elasticsearch.action.support.nodes.TransportNodesAction$AsyncAction.access$200(TransportNodesAction.java:151) ~[elasticsearch-6.8.12.jar:6.8.12]
        at org.elasticsearch.action.support.nodes.TransportNodesAction$AsyncAction$1.handleException(TransportNodesAction.java:210) ~[elasticsearch-6.8.12.jar:6.8.12]
        at org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:534) ~[elasticsearch-6.8.12.jar:6.8.12]
        at org.elasticsearch.action.support.nodes.TransportNodesAction$AsyncAction.start(TransportNodesAction.java:194) ~[elasticsearch-6.8.12.jar:6.8.12]
        at org.elasticsearch.action.support.nodes.TransportNodesAction.doExecute(TransportNodesAction.java:91) ~[elasticsearch-6.8.12.jar:6.8.12]
        at org.elasticsearch.action.support.nodes.TransportNodesAction.doExecute(TransportNodesAction.java:54) ~[elasticsearch-6.8.12.jar:6.8.12]
        at org.elasticsearch.action.support.TransportAction$RequestFilterChain.proceed(TransportAction.java:167) ~[elasticsearch-6.8.12.jar:6.8.12]
        at org.elasticsearch.xpack.security.action.filter.SecurityActionFilter.apply(SecurityActionFilter.java:121) ~[?:?]
        at org.elasticsearch.action.support.TransportAction$RequestFilterChain.proceed(TransportAction.java:165) ~[elasticsearch-6.8.12.jar:6.8.12]
        at org.elasticsearch.action.support.TransportAction.execute(TransportAction.java:139) ~[elasticsearch-6.8.12.jar:6.8.12]
        at org.elasticsearch.action.support.TransportAction.execute(TransportAction.java:81) ~[elasticsearch-6.8.12.jar:6.8.12]```

标签: dockerelasticsearchcluster-computingdocker-swarm

解决方案


推荐阅读