首页 > 解决方案 > 运行时网络未准备好:NetworkReady=false 原因:NetworkPluginNotReady 消息:docker:网络插件未准备好:cni 配置未初始化

问题描述

您遇到的问题:

"runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized"

你期望发生的事情:

重现步骤:

Running GKE
Master version 1.14.8-gke.12
Node version: 1.14.8-gke.2
Machine type n1-standard-8

然后在此升级问题之前完美运行:

1) gcloud beta container node-pools update k-cpu-pool-v1 --cluster=k --workload-metadata-from-node=GKE_METADATA_SERVER --zone=us-central1-a # fails with 2nd node gcloud beta container node-pools rollback k-cpu-pool-v1 --cluster=k3 --zone=us-central1-a # also fails with 2nd node and many deployment won't come up 2)

trying to "Enable metadata server" per instruction
https://medium.com/@louisvernon/mapping-kubernetes-service-accounts-to-gcp-iams-using-workload-identity-b53496d543e0 
but blocked by failure of previous deployment

其他信息(您尝试过的解决方法、查阅的文档等):

I tried looking at google forum issue but nothing.  Looks like a GKE issue with 
rollback when upgrade fails. double issue. Upgrade and master and node to have
same version? 

It doesn't seem to be this issue because one node came up but second does not in GKE.. (https://stackoverflow.com/questions/52675934/network-plugin-is-not-ready-cni-config-uninitialized)

标签: google-kubernetes-enginerollback

解决方案


我试图重现您的问题:

  1. 创建集群和池:

    gcloud container clusters create test-cluster --zone us-central1-a --cluster-version 1.14.8-gke.12 --node-version 1.14.8-gke.2 --num-nodes=2
    
    WARNING: Currently VPC-native is not the default mode during cluster creation. In the future, this will become the default mode and can be disabled using `--no-enable-ip-alias` flag. Use `--[no-]enable-ip-alias` flag to suppress this warning.
    WARNING: Newly created clusters and node-pools will have node auto-upgrade enabled by default. This can be disabled using the `--no-enable-autoupgrade` flag.
    WARNING: Starting in 1.12, default node pools in new clusters will have their legacy Compute Engine instance metadata endpoints disabled by default. To create a cluster with legacy instance metadata endpoints disabled in the default node pool, run `clusters create` with the flag `--metadata disable-legacy-endpoints=true`.
    WARNING: Your Pod address range (`--cluster-ipv4-cidr`) can accommodate at most 1008 node(s). 
    This will enable the autorepair feature for nodes. Please see https://cloud.google.com/kubernetes-engine/docs/node-auto-repair for more information on node autorepairs.
    Creating cluster test-cluster in us-central1-a... Cluster is being health-checked (master is healthy)...done.              
    Created [https://container.googleapis.com/v1/projects/test-prj/zones/us-central1-a/clusters/test-cluster].
    To inspect the contents of your cluster, go to: https://console.cloud.google.com/kubernetes/workload_/gcloud/us-central1-a/test-cluster?project=test-prj
    
    NAME LOCATION MASTER_VERSION MASTER_IP MACHINE_TYPE NODE_VERSION NUM_NODES STATUS
    
    test-cluster us-central1-a 1.14.8-gke.12 XX.XX.75.247 n1-standard-1 1.14.8-gke.2 2 RUNNING
    
  2. 通过 UI 启用 Workload Identity (beta)

Workload Identity Enabled

  1. 扩展到 3 个节点

    gcloud container clusters resize test-cluster --node-pool default-pool --num-nodes=3 --zone=us-central1-a
    
    Pool [default-pool] for [test-cluster] will be resized to 3.
    Do you want to continue (Y/n)?  y
    Resizing test-cluster...done.                                                                                              
    Updated [https://container.googleapis.com/v1/projects/test-prj/zones/us-central1-a/clusters/test-cluster].
    
  2. 升级节点

    gcloud beta container node-pools update default-pool --cluster=test-cluster --workload-metadata-from-node=GKE_METADATA_SERVER --zone=us-central1-a
    
    Updating node pool default-pool... Done with 3 out of 3 nodes (100.0%): 3 succeeded...done.                                       
    Updated [https://container.googleapis.com/v1beta1/projects/test-prj/zones/us-central1-a/clusters/test-cluster/nodePools/default-pool].
    
  3. 缩减到 2 个节点

    cloud container clusters resize test-cluster --node-pool default-pool --num-nodes=2 --zone=us-central1-a
    
    Pool [default-pool] for [test-cluster] will be resized to 2.
    Do you want to continue (Y/n)?  y
    Resizing test-cluster...done.                                                                                              
    Updated [https://container.googleapis.com/v1/projects/test-prj/zones/us-central1-a/clusters/test-cluster].
    
  4. 禁用 Workload Identity (beta) 6.1。首先你应该去Kubernetes clusters点击你的集群->在ClustersNode pools然后点击default-pool然后Edit node pool-> Edit default-pool->去Security并取消选中Enable GKE Metadata Server (beta)。6.2. 然后Kubernetes clusters点击你的集群 - >Clusters点击Edit并设置Workload Identity (beta)Disabled.

我在测试集群上检查了所有这些命令,没有发现错误或网络问题。之后,我尝试重复步骤 2-5,然后回滚:

gcloud beta container node-pools rollback default-pool --cluster=test-cluster --zone=us-central1-a  

Node Pool: [default-pool], of Cluster: [test-cluster] will be 
rolled back to previous configuration. This operation is long-running 
and will block other operations on the cluster (including delete) 
until it has run to completion.

Do you want to continue (Y/n)?  y

Rolling back default-pool... Done with 1 out of 2 nodes (50.0%): 1 being processed, 1 succeeded...done.                           
Updated [https://container.googleapis.com/v1beta1/projects/test-prj/zones/us-central1-a/clusters/test-cluster/nodePools/default-pool].
operationId: operation-1577965484794-e4b2b2a6
projectId: test-prj
zone: us-central1-a

也没有错误和网络问题。然后我可以通过 UI 禁用 Workload Identity (beta),如我在步骤 6 中所述。

看起来一切正常,您的配置中存在一些特定问题。


推荐阅读