首页 > 解决方案 > GKE 从 nodepool 开始非常慢 - 集群和 k8s/gcloud api 不可用

问题描述

目前我们有一个由 7 个节点和 9 个微服务组成的 GKE 集群。默认情况下,我们还添加了 2 个具有 2 个节点的节点池。我们使用 istio 来做微服务之间的负载均衡。

我们的 CI 环境使用脚本创建所有内容。问题是集群需要几分钟才能与节点池一起使用。

我的主要问题是:为什么这段时间api不可用?

kube-system 的日志中也有很多错误,这里是一小段摘录:

k8s.io/dns/pkg/dns/dns.go:192: Failed to list *v1.Service: Get https://10.0.0.1:443/api/v1/services?resourceVersion=0: dial tcp 10.0.0.1:443: connect: connection refused k8s.io/dns/pkg/dns/dns.go:189: Failed to list *v1.Endpoints: Get https://10.0.0.1:443/api/v1/endpoints?resourceVersion=0: dial tcp 10.0.0.1:443: connect: connection refused github.com/GoogleCloudPlatform/k8s-stackdriver/event-exporter/watchers/watcher.go:55: Failed to list *v1.Event: Get https://10.0.0.1:443/api/v1/events?resourceVersion=0: dial tcp 10.0.0.1:443: getsockopt: connection refused github.com/kubernetes-incubator/metrics-server/metrics/util/util.go:52: Failed to list *v1.Node: Get https://10.0.0.1:443/api/v1/nodes?resourceVersion=0: dial tcp 10.0.0.1:443: getsockopt: connection refused github.com/kubernetes-incubator/metrics-server/metrics/util/util.go:52: Failed to list *v1.Node: Get https://10.0.0.1:443/api/v1/nodes?resourceVersion=0: dial tcp 10.0.0.1:443: getsockopt: connection refused "ERROR: logging before flag.Parse: E1114 09:50:42.925080 1 reflector.go:205] k8s.io/autoscaler/addon-resizer/nanny/kubernetes_client.go:107: Failed to list *v1.Node: Get https://10.0.0.1:443/api/v1/nodes?resourceVersion=0: dial tcp 10.0.0.1:443: getsockopt: connection refused " k8s.io/dns/pkg/dns/dns.go:189: Failed to list *v1.Endpoints: Get https://10.0.0.1:443/api/v1/endpoints?resourceVersion=0: dial tcp 10.0.0.1:443: connect: connection refused k8s.io/dns/pkg/dns/dns.go:192: Failed to list *v1.Service: Get https://10.0.0.1:443/api/v1/services?resourceVersion=0: dial tcp 10.0.0.1:443: connect: connection refused "ERROR: logging before flag.Parse: E1114 09:50:42.873176 1 reflector.go:205] k8s.io/autoscaler/addon-resizer/nanny/kubernetes_client.go:107: Failed to list *v1.Node: Get https://10.0.0.1:443/api/v1/nodes?resourceVersion=0: dial tcp 10.0.0.1:443: getsockopt: connection refused " k8s.io/heapster/metrics/heapster.go:331: Failed to list *v1.Pod: Get https://10.0.0.1:443/api/v1/pods?limit=500&resourceVersion=0: dial tcp 10.0.0.1:443: getsockopt: connection refused k8s.io/heapster/metrics/processors/namespace_based_enricher.go:90: Failed to list *v1.Namespace: Get https://10.0.0.1:443/api/v1/namespaces?limit=500&resourceVersion=0: dial tcp 10.0.0.1:443: getsockopt: connection refused k8s.io/heapster/metrics/util/util.go:32: Failed to list *v1.Node: Get https://10.0.0.1:443/api/v1/nodes?limit=500&resourceVersion=0: dial tcp 10.0.0.1:443: getsockopt: connection refused k8s.io/heapster/metrics/util/util.go:32: Failed to list *v1.Node: Get https://10.0.0.1:443/api/v1/nodes?limit=500&resourceVersion=0: dial tcp 10.0.0.1:443: getsockopt: connection refused k8s.io/heapster/metrics/util/util.go:32: Failed to list *v1.Node: Get https://10.0.0.1:443/api/v1/nodes?limit=500&resourceVersion=0: dial tcp 10.0.0.1:443: getsockopt: connection refused Error while getting cluster status: Get https://10.0.0.1:443/api/v1/nodes: dial tcp 10.0.0.1:443: getsockopt: connection refused k8s.io/dns/pkg/dns/dns.go:192: Failed to list *v1.Service: Get https://10.0.0.1:443/api/v1/services?resourceVersion=0: dial tcp 10.0.0.1:443: connect: connection refused k8s.io/dns/pkg/dns/dns.go:189: Failed to list *v1.Endpoints: Get https://10.0.0.1:443/api/v1/endpoints?resourceVersion=0: dial tcp 10.0.0.1:443: connect: connection refused github.com/GoogleCloudPlatform/k8s-stackdriver/event-exporter/watchers/watcher.go:55: Failed to list *v1.Event: Get https://10.0.0.1:443/api/v1/events?resourceVersion=0: dial tcp 10.0.0.1:443: getsockopt: connection refused github.com/kubernetes-incubator/metrics-server/metrics/util/util.go:52: Failed to list *v1.Node: Get https://10.0.0.1:443/api/v1/nodes?resourceVersion=0: dial tcp 10.0.0.1:443: getsockopt: connection refused github.com/kubernetes-incubator/metrics-server/metrics/heapster.go:254: Failed to list *v1.Pod: Get https://10.0.0.1:443/api/v1/pods?resourceVersion=0: dial tcp 10.0.0.1:443: getsockopt: connection refused github.com/kubernetes-incubator/metrics-server/metrics/processors/namespace_based_enricher.go:85: Failed to list *v1.Namespace: Get https://10.0.0.1:443/api/v1/namespaces?resourceVersion=0: dial tcp 10.0.0.1:443: getsockopt: connection refused github.com/kubernetes-incubator/metrics-server/metrics/util/util.go:52: Failed to list *v1.Node: Get https://10.0.0.1:443/api/v1/nodes?resourceVersion=0: dial tcp 10.0.0.1:443: getsockopt: connection refused github.com/kubernetes-incubator/metrics-server/metrics/util/util.go:52: Failed to list *v1.Node: Get https://10.0.0.1:443/api/v1/nodes?resourceVersion=0: dial tcp 10.0.0.1:443: getsockopt: connection refused "ERROR: logging before flag.Parse: E1114 09:50:41.824128 1 reflector.go:205] k8s.io/autoscaler/addon-resizer/nanny/kubernetes_client.go:107: Failed to list *v1.Node: Get https://10.0.0.1:443/api/v1/nodes?resourceVersion=0: dial tcp 10.0.0.1:443: getsockopt: connection refused "

标签: kubernetesgoogle-kubernetes-engine

解决方案


创建 GCE 资源需要时间。在任何环境中,配置一个 VM 和/或多个 VM 通常需要一些时间。端点不可用,因为主节点尚未准备好。创建集群后,您可以在不中断主节点的情况下添加 2 个额外的节点池。


推荐阅读