首页 > 解决方案 > 使用 eksctl 删除 EKS 集群无法正常工作,需要手动删除 ManagedNodeGroups 等资源

问题描述

我在 EKS 上运行一个集群,并按照教程使用命令部署一个eksctl create cluster --name prod --version 1.17 --region eu-west-1 --nodegroup-name standard-workers --node-type t3.medium --nodes 3 --nodes-min 1 --nodes-max 4 --ssh-access --ssh-public-key public-key.pub --managed

一旦我完成了我的测试(主要是安装然后卸载 helm 图表),并且我有一个没有运行任何作业的干净集群,然后我尝试使用 删除它eksctl delete cluster --name prod,从而导致这些错误。

[ℹ]  eksctl version 0.25.0
[ℹ]  using region eu-west-1
[ℹ]  deleting EKS cluster "test"
[ℹ]  deleted 0 Fargate profile(s)
[✔]  kubeconfig has been updated
[ℹ]  cleaning up AWS load balancers created by Kubernetes objects of Kind Service or Ingress
[ℹ]  2 sequential tasks: { delete nodegroup "standard-workers", delete cluster control plane "test" [async] }
[ℹ]  will delete stack "eksctl-test-nodegroup-standard-workers"
[ℹ]  waiting for stack "eksctl-test-nodegroup-standard-workers" to get deleted
[✖]  unexpected status "DELETE_FAILED" while waiting for CloudFormation stack "eksctl-test-nodegroup-standard-workers"
[ℹ]  fetching stack events in attempt to troubleshoot the root cause of the failure
[✖]  AWS::CloudFormation::Stack/eksctl-test-nodegroup-standard-workers: DELETE_FAILED – "The following resource(s) failed to delete: [ManagedNodeGroup]. "
[✖]  AWS::EKS::Nodegroup/ManagedNodeGroup: DELETE_FAILED – "Nodegroup standard-workers failed to stabilize: [{Code: Ec2SecurityGroupDeletionFailure,Message: DependencyViolation - resource has a dependent object,ResourceIds: [[REDACTED]]}]"
[ℹ]  1 error(s) occurred while deleting cluster with nodegroup(s)
[✖]  waiting for CloudFormation stack "eksctl-test-nodegroup-standard-workers": ResourceNotReady: failed waiting for successful resource state

为了修复它们,我必须手动删除 AWS VPC,然后删除 ManagednodeGroups,然后再次删除所有内容。

我再次尝试了上述步骤(使用官方入门文档中提供的命令创建和删除),但删除时出现相同的错误。

在做这样的事情时我必须手动删除资源似乎非常奇怪。是否有解决此问题的方法,是我做错了什么,还是这是标准程序?

所有命令都通过官方的eksctl cli运行,我正在关注官方的eksctl部署

标签: amazon-web-serviceskubernetesamazon-ekseksctl

解决方案


如果我们尝试删除节点组 EC2 所附加的相应安全组,我们将找到根本原因。

大多数情况下,它会说连接了一个网络接口。

所以解决方案是手动删除链接的网络接口。现在节点组将被删除而没有任何错误。


推荐阅读