首页 > 解决方案 > GKE's cluster autoscaler got stucked in initializing status

问题描述

I was optimizing cluster (GKE) utilization recently and 2 days ago I've noticed that my nodes are not scaling up or down. Autoscaling config map is in initialization mode:

kubectl describe -n kube-system configmap cluster-autoscaler-status
Name:         cluster-autoscaler-status
Namespace:    kube-system
Labels:       <none>
Annotations:  cluster-autoscaler.kubernetes.io/last-updated: 2020-04-29 14:44:54.363091383 +0000 UTC

Data
====
status:
----
Cluster-autoscaler status at 2020-04-29 14:44:54.363091383 +0000 UTC:
Initializing
Events:  <none>

The other clusters contain proper autoscaling events. I think that I could overload cluster with the number of pods. It contains ~100 pods / node.

Update 1:

  1. What GKE version running on master?: 1.14.10-gke.27, but I thought the upgrade to 1.15.11-gke.9 would help (and will master somehow). It didn't help. We have other clusters with those same versions and pools.
  2. Does it happen to any node pools or is it occurring to a specific one?: Autoscaling config map is kind of "global level", so all node pools are being affected.
  3. Could you provide the pool sizes, gke-versions and autoscaling settings?
default  OK 1.14.10-gke.27  4 (2 per zone) custom-8-45056   Container-Optimized OS (cos)    0 - 8 nodes per zone    
preemptible8-2   OK 1.14.10-gke.27  10 (5 per zone) n1-standard-8   Container-Optimized OS (cos)    0 - 20 nodes per zone   
scalability-stable-2-cpu     OK 1.14.10-gke.27 1 (0 - 1 per zone) n1-standard-2 Container-Optimized OS (cos)    0 - 4 nodes per zone

Additional information:

  1. When it turned off autoscaling and turned on in every node pool, the output of kubectl describe -n kube-system configmap cluster-autoscaler-status has changed.
  2. I thought it might happen when I was changing the settings of the: scalability-stable-2-cpu.

标签: kubernetesgoogle-kubernetes-engineautoscaling

解决方案


3天后恢复正常。


推荐阅读