elasticsearch - 为什么这个索引状态是红色的:opendistro-ism-config
问题描述
我认为我什至没有接触过这个索引,但它使我的整个集群都处于红色状态。不知道它是什么或如何修复它,尝试添加另一个节点但没有工作。在索引管理视图中,我可以看到它是唯一的红色索引。问题指数为opendistro-ism-config
。我尝试更改索引的副本数、添加节点等,但没有帮助。
编辑
正如@Val 所问,我添加了以下查询。我的索引保持红色状态,这会在我部署集群的 AWS 上向我发出垃圾邮件警报。我已经分配了索引,所以我从输出中删除了它们shard_sizes
,只留下了一个有问题的索引。我有4 x t2.small
35 GiB SSD,集群中有足够的备用空间。这不是我的产品集群,所以还不错,但很烦人。
https://{{ES_DOMAIN}}/_cluster/allocation/explain?include_disk_info&include_yes_decisions
{
"index": ".opendistro-ism-config",
"shard": 1,
"primary": true,
"current_state": "unassigned",
"unassigned_info": {
"reason": "ALLOCATION_FAILED",
"at": "2020-08-01T09:18:40.288Z",
"failed_allocation_attempts": 5,
"details": "failed shard on node [ex3PL3THRHmAxkvMjOwrQQ]: failed to create shard, failure IOException[failed to obtain in-memory shard lock]; nested: ShardLockObtainFailedException[[.opendistro-ism-config][1]: obtaining shard lock timed out after 5000ms, previous lock details: [shard creation] trying to lock for [shard creation]]; ",
"last_allocation_status": "no_valid_shard_copy"
},
"cluster_info": {
"nodes": {
"KnCBTiL1TZCGz1DNYfm9_A": {
"node_name": "ef9116cc46563e2c73d12eb7a8887f4c",
"least_available": {
"total_bytes": 36722737152,
"used_bytes": 2143232000,
"free_bytes": 34579505152,
"free_disk_percent": 94.2,
"used_disk_percent": 5.8
},
"most_available": {
"total_bytes": 36722737152,
"used_bytes": 2143232000,
"free_bytes": 34579505152,
"free_disk_percent": 94.2,
"used_disk_percent": 5.8
}
},
"90rKZw_SSOSlOGWv_WyQQQ": {
"node_name": "45cfd2c275112972c5e68e7e00295d45",
"least_available": {
"total_bytes": 36722737152,
"used_bytes": 2144980992,
"free_bytes": 34577756160,
"free_disk_percent": 94.2,
"used_disk_percent": 5.8
},
"most_available": {
"total_bytes": 36722737152,
"used_bytes": 2144980992,
"free_bytes": 34577756160,
"free_disk_percent": 94.2,
"used_disk_percent": 5.8
}
},
"2F_QTYueTs69Q7KhCped9w": {
"node_name": "a8314d5f13c0043f8454997d973e8c03",
"least_available": {
"total_bytes": 36722737152,
"used_bytes": 1957380096,
"free_bytes": 34765357056,
"free_disk_percent": 94.7,
"used_disk_percent": 5.3
},
"most_available": {
"total_bytes": 36722737152,
"used_bytes": 1957380096,
"free_bytes": 34765357056,
"free_disk_percent": 94.7,
"used_disk_percent": 5.3
}
},
"8-oMtA69QvO3bKTAAUPeBw": {
"node_name": "9c042bb3814270c16b4fba03ff85208d",
"least_available": {
"total_bytes": 36722737152,
"used_bytes": 2140692480,
"free_bytes": 34582044672,
"free_disk_percent": 94.2,
"used_disk_percent": 5.8
},
"most_available": {
"total_bytes": 36722737152,
"used_bytes": 2140692480,
"free_bytes": 34582044672,
"free_disk_percent": 94.2,
"used_disk_percent": 5.8
}
}
},
"shard_sizes": {
"[.opendistro-ism-config][2][r]_bytes": 56497,
"[.opendistro-ism-config][0][p]_bytes": 53651,
"[.opendistro-ism-config][0][r]_bytes": 53651,
"[.opendistro-ism-config][4][p]_bytes": 33157,
"[.opendistro-ism-config][2][p]_bytes": 56497
}
},
"can_allocate": "no_valid_shard_copy",
"allocate_explanation": "cannot allocate because a previous copy of the primary shard existed but can no longer be found on the nodes in the cluster",
"node_allocation_decisions": [
{
"node_id": "2F_QTYueTs69Q7KhCped9w",
"node_name": "a8314d5f13c0043f8454997d973e8c03",
"node_decision": "no",
"store": {
"found": false
}
},
{
"node_id": "8-oMtA69QvO3bKTAAUPeBw",
"node_name": "9c042bb3814270c16b4fba03ff85208d",
"node_decision": "no",
"store": {
"found": false
}
},
{
"node_id": "90rKZw_SSOSlOGWv_WyQQQ",
"node_name": "45cfd2c275112972c5e68e7e00295d45",
"node_decision": "no",
"store": {
"found": false
}
},
{
"node_id": "KnCBTiL1TZCGz1DNYfm9_A",
"node_name": "ef9116cc46563e2c73d12eb7a8887f4c",
"node_decision": "no",
"store": {
"found": false
}
}
]
}
解决方案
使您的集群再次工作的解决方法是手动重新路由分片。
问题原因:当它与主节点断开连接时,如果有一个主节点没有分配给该节点的副本,则通常会发生这种情况。因此,当重新加入集群时,节点上本地分配的分片副本无法释放以前使用的资源,此时主节点已经进行了 5 次尝试再次将分片分配给节点失败。
在 5 次不成功的分配尝试后,master 放弃并需要手动触发再次分配。
解决方案:运行以下命令以解决相同问题:
curl -XPOST 'localhost:9200/_cluster/reroute?retry_failed
推荐阅读
- python - 在 jedi-vim 中跨文件重构
- r - Ylim max 随变量动态变化,而 min 在 R 中设置为 0
- netsuite - 将 afterSubmit 脚本转换为 Worfkflow 动作脚本?
- r - 在 dplyr mutate 工作流程中引用和索引其他数据帧
- php - Laravel Ajax 请求,没有“Access-Control-Allow-Origin”标头
- javascript - 如何在 keydown 上启动计时器并在 keyup 上停止?
- memory-management - 如何在不初始化的情况下在 Windows 中分配 RAM 页面?
- reactjs - 组件之间的 React 过渡组动画(淡入组件代替另一个)
- android - 模拟器 Android 15 - 黑屏
- amazon-web-services - 您可以使用基于 AWS Route 53 延迟的路由来解析 AWS 之外的服务器吗?