kubernetes - GKE 从 nodepool 开始非常慢 - 集群和 k8s/gcloud api 不可用
问题描述
目前我们有一个由 7 个节点和 9 个微服务组成的 GKE 集群。默认情况下,我们还添加了 2 个具有 2 个节点的节点池。我们使用 istio 来做微服务之间的负载均衡。
我们的 CI 环境使用脚本创建所有内容。问题是集群需要几分钟才能与节点池一起使用。
我的主要问题是:为什么这段时间api不可用?
kube-system 的日志中也有很多错误,这里是一小段摘录:
k8s.io/dns/pkg/dns/dns.go:192: Failed to list *v1.Service: Get https://10.0.0.1:443/api/v1/services?resourceVersion=0: dial tcp 10.0.0.1:443: connect: connection refused k8s.io/dns/pkg/dns/dns.go:189: Failed to list *v1.Endpoints: Get https://10.0.0.1:443/api/v1/endpoints?resourceVersion=0: dial tcp 10.0.0.1:443: connect: connection refused github.com/GoogleCloudPlatform/k8s-stackdriver/event-exporter/watchers/watcher.go:55: Failed to list *v1.Event: Get https://10.0.0.1:443/api/v1/events?resourceVersion=0: dial tcp 10.0.0.1:443: getsockopt: connection refused github.com/kubernetes-incubator/metrics-server/metrics/util/util.go:52: Failed to list *v1.Node: Get https://10.0.0.1:443/api/v1/nodes?resourceVersion=0: dial tcp 10.0.0.1:443: getsockopt: connection refused github.com/kubernetes-incubator/metrics-server/metrics/util/util.go:52: Failed to list *v1.Node: Get https://10.0.0.1:443/api/v1/nodes?resourceVersion=0: dial tcp 10.0.0.1:443: getsockopt: connection refused "ERROR: logging before flag.Parse: E1114 09:50:42.925080 1 reflector.go:205] k8s.io/autoscaler/addon-resizer/nanny/kubernetes_client.go:107: Failed to list *v1.Node: Get https://10.0.0.1:443/api/v1/nodes?resourceVersion=0: dial tcp 10.0.0.1:443: getsockopt: connection refused " k8s.io/dns/pkg/dns/dns.go:189: Failed to list *v1.Endpoints: Get https://10.0.0.1:443/api/v1/endpoints?resourceVersion=0: dial tcp 10.0.0.1:443: connect: connection refused k8s.io/dns/pkg/dns/dns.go:192: Failed to list *v1.Service: Get https://10.0.0.1:443/api/v1/services?resourceVersion=0: dial tcp 10.0.0.1:443: connect: connection refused "ERROR: logging before flag.Parse: E1114 09:50:42.873176 1 reflector.go:205] k8s.io/autoscaler/addon-resizer/nanny/kubernetes_client.go:107: Failed to list *v1.Node: Get https://10.0.0.1:443/api/v1/nodes?resourceVersion=0: dial tcp 10.0.0.1:443: getsockopt: connection refused " k8s.io/heapster/metrics/heapster.go:331: Failed to list *v1.Pod: Get https://10.0.0.1:443/api/v1/pods?limit=500&resourceVersion=0: dial tcp 10.0.0.1:443: getsockopt: connection refused k8s.io/heapster/metrics/processors/namespace_based_enricher.go:90: Failed to list *v1.Namespace: Get https://10.0.0.1:443/api/v1/namespaces?limit=500&resourceVersion=0: dial tcp 10.0.0.1:443: getsockopt: connection refused k8s.io/heapster/metrics/util/util.go:32: Failed to list *v1.Node: Get https://10.0.0.1:443/api/v1/nodes?limit=500&resourceVersion=0: dial tcp 10.0.0.1:443: getsockopt: connection refused k8s.io/heapster/metrics/util/util.go:32: Failed to list *v1.Node: Get https://10.0.0.1:443/api/v1/nodes?limit=500&resourceVersion=0: dial tcp 10.0.0.1:443: getsockopt: connection refused k8s.io/heapster/metrics/util/util.go:32: Failed to list *v1.Node: Get https://10.0.0.1:443/api/v1/nodes?limit=500&resourceVersion=0: dial tcp 10.0.0.1:443: getsockopt: connection refused Error while getting cluster status: Get https://10.0.0.1:443/api/v1/nodes: dial tcp 10.0.0.1:443: getsockopt: connection refused k8s.io/dns/pkg/dns/dns.go:192: Failed to list *v1.Service: Get https://10.0.0.1:443/api/v1/services?resourceVersion=0: dial tcp 10.0.0.1:443: connect: connection refused k8s.io/dns/pkg/dns/dns.go:189: Failed to list *v1.Endpoints: Get https://10.0.0.1:443/api/v1/endpoints?resourceVersion=0: dial tcp 10.0.0.1:443: connect: connection refused github.com/GoogleCloudPlatform/k8s-stackdriver/event-exporter/watchers/watcher.go:55: Failed to list *v1.Event: Get https://10.0.0.1:443/api/v1/events?resourceVersion=0: dial tcp 10.0.0.1:443: getsockopt: connection refused github.com/kubernetes-incubator/metrics-server/metrics/util/util.go:52: Failed to list *v1.Node: Get https://10.0.0.1:443/api/v1/nodes?resourceVersion=0: dial tcp 10.0.0.1:443: getsockopt: connection refused github.com/kubernetes-incubator/metrics-server/metrics/heapster.go:254: Failed to list *v1.Pod: Get https://10.0.0.1:443/api/v1/pods?resourceVersion=0: dial tcp 10.0.0.1:443: getsockopt: connection refused github.com/kubernetes-incubator/metrics-server/metrics/processors/namespace_based_enricher.go:85: Failed to list *v1.Namespace: Get https://10.0.0.1:443/api/v1/namespaces?resourceVersion=0: dial tcp 10.0.0.1:443: getsockopt: connection refused github.com/kubernetes-incubator/metrics-server/metrics/util/util.go:52: Failed to list *v1.Node: Get https://10.0.0.1:443/api/v1/nodes?resourceVersion=0: dial tcp 10.0.0.1:443: getsockopt: connection refused github.com/kubernetes-incubator/metrics-server/metrics/util/util.go:52: Failed to list *v1.Node: Get https://10.0.0.1:443/api/v1/nodes?resourceVersion=0: dial tcp 10.0.0.1:443: getsockopt: connection refused "ERROR: logging before flag.Parse: E1114 09:50:41.824128 1 reflector.go:205] k8s.io/autoscaler/addon-resizer/nanny/kubernetes_client.go:107: Failed to list *v1.Node: Get https://10.0.0.1:443/api/v1/nodes?resourceVersion=0: dial tcp 10.0.0.1:443: getsockopt: connection refused "
解决方案
创建 GCE 资源需要时间。在任何环境中,配置一个 VM 和/或多个 VM 通常需要一些时间。端点不可用,因为主节点尚未准备好。创建集群后,您可以在不中断主节点的情况下添加 2 个额外的节点池。
推荐阅读
- javascript - Vue js 对 js 做出反应
- javascript - for循环内的回调函数-Nodejs
- angular - 如何加载连接到 PostGreSQL 数据库的粗体报告?
- css - FlexBox 中的表格视图
- java - 将json中的密码字符串反序列化为受保护的字符串
- javascript - 无法将 d3 集成到 Vue 3 组件中
- php - 未找到数据库中的图像
- json - 用换行符反应/格式化 JSON 对象
- javascript - 上传多张图片并将它们保存到javascript中的localStorage
- vue.js - 在 cloudflare 工作人员上运行 vue3 ssr