首页 > 解决方案 > 来自 k8s 主节点的 Cluster-IP 连接受损/延迟

问题描述

我在 CentOS 7 上使用 kubernetes 1.17 和 flannel:v0.11.0 并且我的 CLUSTER-IP 从控制平面可达性存在问题。

我使用 kubeadm 手动安装和设置了集群。

这基本上是我的集群:

k8s-master-01 10.0.0.50/24
k8s-worker-01 10.0.0.60/24 
k8s-worker-02 10.0.0.61/24

Pod CIDR: 10.244.0.0/16
Service CIDR: 10.96.0.0/12

提示:每个节点有两个网卡(eth0:上行,eth1:私有)上面列出的IP分别分配给eth1。kubelet、kube-proxy 和 flannel 被配置为通过 eth1 上的专用网络发送/接收它们的流量。

我第一次尝试metric-server通过 kube-apiserver 提供 api 时遇到了这个问题。我按照这里的说明进行操作。控制平面似乎无法与服务网络正常通信。

所以这是我的kube-system命名空间 pod:

$ kubectl get pods -n kube-system -o wide
NAME                                    READY   STATUS    RESTARTS   AGE     IP            NODE            NOMINATED NODE   READINESS GATES
coredns-6955765f44-jrbs6                0/1     Running   9          24d     10.244.0.30   k8s-master-01   <none>           <none>
coredns-6955765f44-mwn2l                1/1     Running   8          24d     10.244.1.37   k8s-worker-01   <none>           <none>
etcd-k8s-master-01                      1/1     Running   9          24d     10.0.0.50     k8s-master-01   <none>           <none>
kube-apiserver-k8s-master-01            1/1     Running   0          2m26s   10.0.0.50     k8s-master-01   <none>           <none>
kube-controller-manager-k8s-master-01   1/1     Running   15         24d     10.0.0.50     k8s-master-01   <none>           <none>
kube-flannel-ds-amd64-7d6jq             1/1     Running   11         26d     10.0.0.60     k8s-worker-01   <none>           <none>
kube-flannel-ds-amd64-c5rj2             1/1     Running   11         26d     10.0.0.50     k8s-master-01   <none>           <none>
kube-flannel-ds-amd64-dsg6l             1/1     Running   11         26d     10.0.0.61     k8s-worker-02   <none>           <none>
kube-proxy-mrz9v                        1/1     Running   10         24d     10.0.0.50     k8s-master-01   <none>           <none>
kube-proxy-slt95                        1/1     Running   9          24d     10.0.0.61     k8s-worker-02   <none>           <none>
kube-proxy-txlrp                        1/1     Running   9          24d     10.0.0.60     k8s-worker-01   <none>           <none>
kube-scheduler-k8s-master-01            1/1     Running   14         24d     10.0.0.50     k8s-master-01   <none>           <none>
metrics-server-67684d476-mrvj2          1/1     Running   2          7d23h   10.244.2.43   k8s-worker-02   <none>           <none>

所以这是我的服务:

kubectl get services --all-namespaces -o wide
NAMESPACE              NAME                        TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)                  AGE    SELECTOR
default                kubernetes                  ClusterIP   10.96.0.1       <none>        443/TCP                  26d    <none>
default                phpdemo                     ClusterIP   10.96.52.157    <none>        80/TCP                   11d    app=phpdemo
kube-system            kube-dns                    ClusterIP   10.96.0.10      <none>        53/UDP,53/TCP,9153/TCP   26d    k8s-app=kube-dns
kube-system            metrics-server              ClusterIP   10.96.71.138    <none>        443/TCP                  5d3h   k8s-app=metrics-server
kubernetes-dashboard   dashboard-metrics-scraper   ClusterIP   10.99.136.237   <none>        8000/TCP                 23d    k8s-app=dashboard-metrics-scraper
kubernetes-dashboard   kubernetes-dashboard        ClusterIP   10.97.209.113   <none>        443/TCP                  23d    k8s-app=kubernetes-dashboard

由于连接检查失败,Metric API 不起作用:

$ kubectl describe apiservice v1beta1.metrics.k8s.io
...
Status:
  Conditions:
    Last Transition Time:  2019-12-27T21:25:01Z
    Message:               failing or missing response from https://10.96.71.138:443/apis/metrics.k8s.io/v1beta1: Get https://10.96.71.138:443/apis/metrics.k8s.io/v1beta1: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
    Reason:                FailedDiscoveryCheck
    Status:                False
    Type:  

kube-apiserver没有连接:

$ kubectl logs --tail=20 kube-apiserver-k8s-master-01 -n kube-system
...
I0101 22:27:00.712413       1 controller.go:107] OpenAPI AggregationController: Processing item v1beta1.metrics.k8s.io
W0101 22:27:00.712514       1 handler_proxy.go:97] no RequestInfo found in the context
E0101 22:27:00.712559       1 controller.go:114] loading OpenAPI spec for "v1beta1.metrics.k8s.io" failed with: failed to retrieve openAPI spec, http error: ResponseCode: 503, Body: service unavailable
, Header: map[Content-Type:[text/plain; charset=utf-8] X-Content-Type-Options:[nosniff]]
I0101 22:27:00.712591       1 controller.go:127] OpenAPI AggregationController: action for item v1beta1.metrics.k8s.io: Rate Limited Requeue.
E0101 22:27:04.712991       1 available_controller.go:419] v1beta1.metrics.k8s.io failed with: failing or missing response from https://10.96.71.138:443/apis/metrics.k8s.io/v1beta1: Get https://10.96.71.138:443/apis/metrics.k8s.io/v1beta1: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
E0101 22:27:09.714801       1 available_controller.go:419] v1beta1.metrics.k8s.io failed with: failing or missing response from https://10.96.71.138:443/apis/metrics.k8s.io/v1beta1: Get https://10.96.71.138:443/apis/metrics.k8s.io/v1beta1: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
E0101 22:27:34.709557       1 available_controller.go:419] v1beta1.metrics.k8s.io failed with: failing or missing response from https://10.96.71.138:443/apis/metrics.k8s.io/v1beta1: Get https://10.96.71.138:443/apis/metrics.k8s.io/v1beta1: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
E0101 22:27:39.714173       1 available_controller.go:419] v1beta1.metrics.k8s.io failed with: failing or missing response from https://10.96.71.138:443/apis/metrics.k8s.io/v1beta1: Get https://10.96.71.138:443/apis/metrics.k8s.io/v1beta1: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)

我试图弄清楚发生了什么kube-apiserver,最后可以确认问题。> 60s后我得到延迟响应(不幸time的是没有安装)

$ kubectl exec -it kube-apiserver-k8s-master-01 -n kube-system -- /bin/sh
# echo -e "GET /apis/metrics.k8s.io/v1beta1 HTTP/1.1\r\nHost:v1beta1.metrics.k8s.io\r\n" | openssl s_client -connect 10.96.71.138:443 -quiet
Can't use SSL_get_servername
depth=1 CN = localhost-ca@1577481905
verify error:num=19:self signed certificate in certificate chain
verify return:1
depth=1 CN = localhost-ca@1577481905
verify return:1
depth=0 CN = localhost@1577481906
verify return:1
HTTP/1.1 400 Bad Request
Content-Type: text/plain; charset=utf-8
Connection: close

相同的命令从我自己的两个测试 pod(分别来自两个不同的工作节点)成功。因此,可以从工作节点上的 pod 网络访问服务 IP:

$ kubectl exec -it phpdemo-55858f97c4-fjc6q -- /bin/sh
/usr/local/bin # echo -e "GET /apis/metrics.k8s.io/v1beta1 HTTP/1.1\r\nHost:v1beta1.metrics.k8s.io\r\n" | openssl s_client -connect 10.96.71.138:443 -quiet
Can't use SSL_get_servername
depth=1 CN = localhost-ca@1577481905
verify error:num=19:self signed certificate in certificate chain
verify return:1
depth=1 CN = localhost-ca@1577481905
verify return:1
depth=0 CN = localhost@1577481906
verify return:1
HTTP/1.1 403 Forbidden
Content-Type: application/json
X-Content-Type-Options: nosniff
Date: Wed, 01 Jan 2020 22:53:44 GMT
Content-Length: 212

{"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"forbidden: User \"system:anonymous\" cannot get path \"/apis/metrics.k8s.io/v1beta1\"","reason":"Forbidden","details":{},"code":403}

以及从工作节点:

[root@k8s-worker-02 ~ ] time curl -k https://10.96.71.138:443/apis/metrics.k8s.io/v1beta1
{
  "kind": "Status",
  "apiVersion": "v1",
  "metadata": {

  },
  "status": "Failure",
  "message": "forbidden: User \"system:anonymous\" cannot get path \"/apis/metrics.k8s.io/v1beta1\"",
  "reason": "Forbidden",
  "details": {

  },
  "code": 403
}
real    0m0.146s
user    0m0.048s
sys 0m0.089s

这在我的主节点上不起作用。我在 60 秒后收到延迟响应

[root@k8s-master-01 ~ ] time curl -k https://10.96.71.138:443/apis/metrics.k8s.io/v1beta1
{
  "kind": "Status",
  "apiVersion": "v1",
  "metadata": {

  },
  "status": "Failure",
  "message": "forbidden: User \"system:anonymous\" cannot get path \"/apis/metrics.k8s.io/v1beta1\"",
  "reason": "Forbidden",
  "details": {

  },
  "code": 403
}
real    1m3.248s
user    0m0.061s
sys 0m0.079s

从主节点我可以看到很多未回复的 SYN_SENT 数据包。

[root@k8s-master-01 ~ ] conntrack -L -d 10.96.71.138
tcp      6 75 SYN_SENT src=10.0.2.15 dst=10.96.71.138 sport=48550 dport=443 [UNREPLIED] src=10.244.2.38 dst=10.244.0.0 sport=4443 dport=19813 mark=0 use=1
tcp      6 5 SYN_SENT src=10.0.2.15 dst=10.96.71.138 sport=48287 dport=443 [UNREPLIED] src=10.244.2.38 dst=10.244.0.0 sport=4443 dport=23710 mark=0 use=1
tcp      6 40 SYN_SENT src=10.0.2.15 dst=10.96.71.138 sport=48422 dport=443 [UNREPLIED] src=10.244.2.38 dst=10.244.0.0 sport=4443 dport=24286 mark=0 use=1
tcp      6 5 SYN_SENT src=10.0.2.15 dst=10.96.71.138 sport=48286 dport=443 [UNREPLIED] src=10.244.2.38 dst=10.244.0.0 sport=4443 dport=35030 mark=0 use=1
tcp      6 80 SYN_SENT src=10.0.2.15 dst=10.96.71.138 sport=48574 dport=443 [UNREPLIED] src=10.244.2.38 dst=10.244.0.0 sport=4443 dport=40636 mark=0 use=1
tcp      6 50 SYN_SENT src=10.0.2.15 dst=10.96.71.138 sport=48464 dport=443 [UNREPLIED] src=10.244.2.38 dst=10.244.0.0 sport=4443 dport=65512 mark=0 use=1
tcp      6 5 SYN_SENT src=10.0.2.15 dst=10.96.71.138 sport=48290 dport=443 [UNREPLIED] src=10.244.2.38 dst=10.244.0.0 sport=4443 dport=47617 mark=0 use=1

iptables 设置:

[root@k8s-master-01 ~ ] iptables-save | grep 10.96.71.138
-A KUBE-SERVICES ! -s 10.244.0.0/16 -d 10.96.71.138/32 -p tcp -m comment --comment "kube-system/metrics-server: cluster IP" -m tcp --dport 443 -j KUBE-MARK-MASQ
-A KUBE-SERVICES -d 10.96.71.138/32 -p tcp -m comment --comment "kube-system/metrics-server: cluster IP" -m tcp --dport 443 -j KUBE-SVC-LC5QY66VUV2HJ6WZ

kube-proxy 在每个节点上启动并运行,没有错误。

$ kubectl get pods -A -o wide
...
kube-system            kube-proxy-mrz9v                             1/1     Running   10         21d    10.0.0.50     k8s-master-01   <none>           <none>
kube-system            kube-proxy-slt95                             1/1     Running   9          21d    10.0.0.61     k8s-worker-02   <none>           <none>
kube-system            kube-proxy-txlrp                             1/1     Running   9          21d    10.0.0.60     k8s-worker-01   <none>           <none>
$ kubectl -n kube-system logs kube-proxy-mrz9v
W0101 21:31:14.268698       1 server_others.go:323] Unknown proxy mode "", assuming iptables proxy
I0101 21:31:14.283958       1 node.go:135] Successfully retrieved node IP: 10.0.0.50
I0101 21:31:14.284034       1 server_others.go:145] Using iptables Proxier.
I0101 21:31:14.284624       1 server.go:571] Version: v1.17.0
I0101 21:31:14.286031       1 conntrack.go:100] Set sysctl 'net/netfilter/nf_conntrack_max' to 131072
I0101 21:31:14.286093       1 conntrack.go:52] Setting nf_conntrack_max to 131072
I0101 21:31:14.287207       1 conntrack.go:83] Setting conntrack hashsize to 32768
I0101 21:31:14.298760       1 conntrack.go:100] Set sysctl 'net/netfilter/nf_conntrack_tcp_timeout_established' to 86400
I0101 21:31:14.298984       1 conntrack.go:100] Set sysctl 'net/netfilter/nf_conntrack_tcp_timeout_close_wait' to 3600
I0101 21:31:14.300618       1 config.go:313] Starting service config controller
I0101 21:31:14.300665       1 shared_informer.go:197] Waiting for caches to sync for service config
I0101 21:31:14.300720       1 config.go:131] Starting endpoints config controller
I0101 21:31:14.300740       1 shared_informer.go:197] Waiting for caches to sync for endpoints config
I0101 21:31:14.400864       1 shared_informer.go:204] Caches are synced for service config 
I0101 21:31:14.401021       1 shared_informer.go:204] Caches are synced for endpoints config 

> kubectl -n kube-system logs kube-proxy-slt95
W0101 21:31:13.856897       1 server_others.go:323] Unknown proxy mode "", assuming iptables proxy
I0101 21:31:13.905653       1 node.go:135] Successfully retrieved node IP: 10.0.0.61
I0101 21:31:13.905704       1 server_others.go:145] Using iptables Proxier.
I0101 21:31:13.906370       1 server.go:571] Version: v1.17.0
I0101 21:31:13.906983       1 conntrack.go:100] Set sysctl 'net/netfilter/nf_conntrack_max' to 131072
I0101 21:31:13.907032       1 conntrack.go:52] Setting nf_conntrack_max to 131072
I0101 21:31:13.907413       1 conntrack.go:83] Setting conntrack hashsize to 32768
I0101 21:31:13.912221       1 conntrack.go:100] Set sysctl 'net/netfilter/nf_conntrack_tcp_timeout_established' to 86400
I0101 21:31:13.912321       1 conntrack.go:100] Set sysctl 'net/netfilter/nf_conntrack_tcp_timeout_close_wait' to 3600
I0101 21:31:13.915322       1 config.go:313] Starting service config controller
I0101 21:31:13.915353       1 shared_informer.go:197] Waiting for caches to sync for service config
I0101 21:31:13.915755       1 config.go:131] Starting endpoints config controller
I0101 21:31:13.915779       1 shared_informer.go:197] Waiting for caches to sync for endpoints config
I0101 21:31:14.016995       1 shared_informer.go:204] Caches are synced for endpoints config 
I0101 21:31:14.017115       1 shared_informer.go:204] Caches are synced for service config 

> kubectl -n kube-system logs kube-proxy-txlrp
W0101 21:31:13.552518       1 server_others.go:323] Unknown proxy mode "", assuming iptables proxy
I0101 21:31:13.696793       1 node.go:135] Successfully retrieved node IP: 10.0.0.60
I0101 21:31:13.696846       1 server_others.go:145] Using iptables Proxier.
I0101 21:31:13.697396       1 server.go:571] Version: v1.17.0
I0101 21:31:13.698000       1 conntrack.go:100] Set sysctl 'net/netfilter/nf_conntrack_max' to 131072
I0101 21:31:13.698101       1 conntrack.go:52] Setting nf_conntrack_max to 131072
I0101 21:31:13.698509       1 conntrack.go:83] Setting conntrack hashsize to 32768
I0101 21:31:13.704280       1 conntrack.go:100] Set sysctl 'net/netfilter/nf_conntrack_tcp_timeout_established' to 86400
I0101 21:31:13.704467       1 conntrack.go:100] Set sysctl 'net/netfilter/nf_conntrack_tcp_timeout_close_wait' to 3600
I0101 21:31:13.704888       1 config.go:131] Starting endpoints config controller
I0101 21:31:13.704935       1 shared_informer.go:197] Waiting for caches to sync for endpoints config
I0101 21:31:13.705046       1 config.go:313] Starting service config controller
I0101 21:31:13.705059       1 shared_informer.go:197] Waiting for caches to sync for service config
I0101 21:31:13.806299       1 shared_informer.go:204] Caches are synced for endpoints config 
I0101 21:31:13.806430       1 shared_informer.go:204] Caches are synced for service config 

这是我的(默认)kube-proxy设置:

$ kubectl -n kube-system get configmap kube-proxy -o yaml
apiVersion: v1
data:
  config.conf: |-
    apiVersion: kubeproxy.config.k8s.io/v1alpha1
    bindAddress: 0.0.0.0
    clientConnection:
      acceptContentTypes: ""
      burst: 10
      contentType: application/vnd.kubernetes.protobuf
      kubeconfig: /var/lib/kube-proxy/kubeconfig.conf
      qps: 5
    clusterCIDR: 10.244.0.0/16
    configSyncPeriod: 15m0s
    conntrack:
      maxPerCore: 32768
      min: 131072
      tcpCloseWaitTimeout: 1h0m0s
      tcpEstablishedTimeout: 24h0m0s
    enableProfiling: false
    healthzBindAddress: 0.0.0.0:10256
    hostnameOverride: ""
    iptables:
      masqueradeAll: false
      masqueradeBit: 14
      minSyncPeriod: 0s
      syncPeriod: 30s
    ipvs:
      excludeCIDRs: null
      minSyncPeriod: 0s
      scheduler: ""
      strictARP: false
      syncPeriod: 30s
    kind: KubeProxyConfiguration
    metricsBindAddress: 127.0.0.1:10249
    mode: ""
    nodePortAddresses: null
    oomScoreAdj: -999
    portRange: ""
    udpIdleTimeout: 250ms
    winkernel:
      enableDSR: false
      networkName: ""
      sourceVip: ""
  kubeconfig.conf: |-
    apiVersion: v1
    kind: Config
    clusters:
    - cluster:
        certificate-authority: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
        server: https://10.0.0.50:6443
      name: default
    contexts:
    - context:
        cluster: default
        namespace: default
        user: default
      name: default
    current-context: default
    users:
    - name: default
      user:
        tokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
kind: ConfigMap
metadata:
  creationTimestamp: "2019-12-06T22:07:40Z"
  labels:
    app: kube-proxy
  name: kube-proxy
  namespace: kube-system
  resourceVersion: "185"
  selfLink: /api/v1/namespaces/kube-system/configmaps/kube-proxy
  uid: bac4a8df-e318-4c91-a6ed-9305e58ac6d9
$ kubectl -n kube-system get daemonset kube-proxy -o yaml
apiVersion: apps/v1
kind: DaemonSet
metadata:
  annotations:
    deprecated.daemonset.template.generation: "2"
  creationTimestamp: "2019-12-06T22:07:40Z"
  generation: 2
  labels:
    k8s-app: kube-proxy
  name: kube-proxy
  namespace: kube-system
  resourceVersion: "115436"
  selfLink: /apis/apps/v1/namespaces/kube-system/daemonsets/kube-proxy
  uid: 64a53d29-1eaa-424f-9ebd-606bcdb3169c
spec:
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      k8s-app: kube-proxy
  template:
    metadata:
      creationTimestamp: null
      labels:
        k8s-app: kube-proxy
    spec:
      containers:
      - command:
        - /usr/local/bin/kube-proxy
        - --config=/var/lib/kube-proxy/config.conf
        - --hostname-override=$(NODE_NAME)
        env:
        - name: NODE_NAME
          valueFrom:
            fieldRef:
              apiVersion: v1
              fieldPath: spec.nodeName
        image: k8s.gcr.io/kube-proxy:v1.17.0
        imagePullPolicy: IfNotPresent
        name: kube-proxy
        resources: {}
        securityContext:
          privileged: true
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - mountPath: /var/lib/kube-proxy
          name: kube-proxy
        - mountPath: /run/xtables.lock
          name: xtables-lock
        - mountPath: /lib/modules
          name: lib-modules
          readOnly: true
      dnsPolicy: ClusterFirst
      hostNetwork: true
      nodeSelector:
        beta.kubernetes.io/os: linux
      priorityClassName: system-node-critical
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext: {}
      serviceAccount: kube-proxy
      serviceAccountName: kube-proxy
      terminationGracePeriodSeconds: 30
      tolerations:
      - key: CriticalAddonsOnly
        operator: Exists
      - operator: Exists
      volumes:
      - configMap:
          defaultMode: 420
          name: kube-proxy
        name: kube-proxy
      - hostPath:
          path: /run/xtables.lock
          type: FileOrCreate
        name: xtables-lock
      - hostPath:
          path: /lib/modules
          type: ""
        name: lib-modules
  updateStrategy:
    rollingUpdate:
      maxUnavailable: 1
    type: RollingUpdate
status:
  currentNumberScheduled: 3
  desiredNumberScheduled: 3
  numberAvailable: 3
  numberMisscheduled: 0
  numberReady: 3
  observedGeneration: 2
  updatedNumberScheduled: 3

这只是配置错误的结果还是一个错误?任何帮助表示赞赏。

标签: apikubernetestimeoutmetricsflannel

解决方案


这是我为使其工作所做的工作:

1.在kube - --enable-aggregator-routing=trueAPI Server中设置flag。

2.在metrics-server-deployment.yaml中设置以下标志

- --kubelet-insecure-tls
- --kubelet-preferred-address-types=InternalIP

3.hostNetwork: true在metrics-server-deployment.yaml中设置


推荐阅读