首页 > 解决方案 > EKS:使用带有 eksctl 的集群配置 yaml 文件创建新集群,但节点无法加入集群

问题描述

我是 eks 的新手。我使用这个集群配置 yaml 文件来创建一个新集群,

apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig
 
metadata:
  name: h2-dev-cluster
  region: us-west-2
 
nodeGroups:
  - name: h2-dev-ng-1
    instanceType: t2.small
    desiredCapacity: 2
    ssh: # use existing EC2 key
      publicKeyName: dev-eks-node

但 eksctl 停留在

waiting for at least 1 node(s) to become ready in "h2-dev-ng-1

然后超时。

我已经检查了这个 aws 文档中的所有要点https://docs.aws.amazon.com/eks/latest/userguide/troubleshooting.html

所有点都正确排除 The ClusterName in your worker node AWS CloudFormation template 我无法检查,因为UserData已被 cloudformation 加密。

我访问节点和类型之一journalctl -u kubelet,然后找到这些错误

Jul 03 08:22:31 ip-192-168-53-151.us-west-2.compute.internal kubelet[4541]: E0703 08:22:31.007677 4541 reflector.go:123] k8s.io/client-go/informers/factory.go:134: Failed to list *v1beta
Jul 03 08:22:31 ip-192-168-53-151.us-west-2.compute.internal kubelet[4541]: E0703 08:22:31.391913 4541 kubelet.go:2272] node "ip-192-168-53-151.us-west-2.compute.internal" not found
Jul 03 08:22:31 ip-192-168-53-151.us-west-2.compute.internal kubelet[4541]: E0703 08:22:31.434158 4541 reflector.go:123] k8s.io/kubernetes/pkg/kubelet/kubelet.go:459: Failed to list *v1.
Jul 03 08:22:31 ip-192-168-53-151.us-west-2.compute.internal kubelet[4541]: E0703 08:22:31.492746 4541 kubelet.go:2272] node "ip-192-168-53-151.us-west-2.compute.internal" not found

然后我输入 cat /var/lib/kubelet/kubeconfig,我看到如下

apiVersion: v1
kind: Config
clusters:
- cluster:
certificate-authority: /etc/kubernetes/pki/ca.crt
server: MASTER_ENDPOINT
name: kubernetes
contexts:
- context:
cluster: kubernetes
user: kubelet
name: kubelet
current-context: kubelet
users:
- name: kubelet
user:
exec:
apiVersion: client.authentication.k8s.io/v1alpha1
command: /usr/bin/aws-iam-authenticator
args:
- "token"
- "-i"
- "CLUSTER_NAME"
- --region
- "AWS_REGION"

我注意到服务器的参数是MASTER_ENDPINT. 所以我跑来/etc/eks/bootstrap.sh h2-dev-cluster 设置集群名称。找到参数正确如下(我标记了url)

apiVersion: v1
kind: Config
clusters:
- cluster:
certificate-authority: /etc/kubernetes/pki/ca.crt
server: https://XXXXXXXX.gr7.us-west-2.eks.amazonaws.com
name: kubernetes

运行sudo service restart kubectljournalctl -u kubelet 仍然可以发现相同的错误,并且节点仍然无法加入集群

我该如何解决?

eksctl: 0.23.0 rc1 (also test with 0.20.0 has the same error)
kubectl: 1.18.5
os: ubuntu 18.04 (use a new ec2 )

标签: kubernetesamazon-eks

解决方案


推荐阅读