api - 来自 k8s 主节点的 Cluster-IP 连接受损/延迟
问题描述
我在 CentOS 7 上使用 kubernetes 1.17 和 flannel:v0.11.0 并且我的 CLUSTER-IP 从控制平面可达性存在问题。
我使用 kubeadm 手动安装和设置了集群。
这基本上是我的集群:
k8s-master-01 10.0.0.50/24
k8s-worker-01 10.0.0.60/24
k8s-worker-02 10.0.0.61/24
Pod CIDR: 10.244.0.0/16
Service CIDR: 10.96.0.0/12
提示:每个节点有两个网卡(eth0:上行,eth1:私有)上面列出的IP分别分配给eth1。kubelet、kube-proxy 和 flannel 被配置为通过 eth1 上的专用网络发送/接收它们的流量。
我第一次尝试metric-server
通过 kube-apiserver 提供 api 时遇到了这个问题。我按照这里的说明进行操作。控制平面似乎无法与服务网络正常通信。
所以这是我的kube-system
命名空间 pod:
$ kubectl get pods -n kube-system -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
coredns-6955765f44-jrbs6 0/1 Running 9 24d 10.244.0.30 k8s-master-01 <none> <none>
coredns-6955765f44-mwn2l 1/1 Running 8 24d 10.244.1.37 k8s-worker-01 <none> <none>
etcd-k8s-master-01 1/1 Running 9 24d 10.0.0.50 k8s-master-01 <none> <none>
kube-apiserver-k8s-master-01 1/1 Running 0 2m26s 10.0.0.50 k8s-master-01 <none> <none>
kube-controller-manager-k8s-master-01 1/1 Running 15 24d 10.0.0.50 k8s-master-01 <none> <none>
kube-flannel-ds-amd64-7d6jq 1/1 Running 11 26d 10.0.0.60 k8s-worker-01 <none> <none>
kube-flannel-ds-amd64-c5rj2 1/1 Running 11 26d 10.0.0.50 k8s-master-01 <none> <none>
kube-flannel-ds-amd64-dsg6l 1/1 Running 11 26d 10.0.0.61 k8s-worker-02 <none> <none>
kube-proxy-mrz9v 1/1 Running 10 24d 10.0.0.50 k8s-master-01 <none> <none>
kube-proxy-slt95 1/1 Running 9 24d 10.0.0.61 k8s-worker-02 <none> <none>
kube-proxy-txlrp 1/1 Running 9 24d 10.0.0.60 k8s-worker-01 <none> <none>
kube-scheduler-k8s-master-01 1/1 Running 14 24d 10.0.0.50 k8s-master-01 <none> <none>
metrics-server-67684d476-mrvj2 1/1 Running 2 7d23h 10.244.2.43 k8s-worker-02 <none> <none>
所以这是我的服务:
kubectl get services --all-namespaces -o wide
NAMESPACE NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR
default kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 26d <none>
default phpdemo ClusterIP 10.96.52.157 <none> 80/TCP 11d app=phpdemo
kube-system kube-dns ClusterIP 10.96.0.10 <none> 53/UDP,53/TCP,9153/TCP 26d k8s-app=kube-dns
kube-system metrics-server ClusterIP 10.96.71.138 <none> 443/TCP 5d3h k8s-app=metrics-server
kubernetes-dashboard dashboard-metrics-scraper ClusterIP 10.99.136.237 <none> 8000/TCP 23d k8s-app=dashboard-metrics-scraper
kubernetes-dashboard kubernetes-dashboard ClusterIP 10.97.209.113 <none> 443/TCP 23d k8s-app=kubernetes-dashboard
由于连接检查失败,Metric API 不起作用:
$ kubectl describe apiservice v1beta1.metrics.k8s.io
...
Status:
Conditions:
Last Transition Time: 2019-12-27T21:25:01Z
Message: failing or missing response from https://10.96.71.138:443/apis/metrics.k8s.io/v1beta1: Get https://10.96.71.138:443/apis/metrics.k8s.io/v1beta1: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
Reason: FailedDiscoveryCheck
Status: False
Type:
kube-apiserver
没有连接:
$ kubectl logs --tail=20 kube-apiserver-k8s-master-01 -n kube-system
...
I0101 22:27:00.712413 1 controller.go:107] OpenAPI AggregationController: Processing item v1beta1.metrics.k8s.io
W0101 22:27:00.712514 1 handler_proxy.go:97] no RequestInfo found in the context
E0101 22:27:00.712559 1 controller.go:114] loading OpenAPI spec for "v1beta1.metrics.k8s.io" failed with: failed to retrieve openAPI spec, http error: ResponseCode: 503, Body: service unavailable
, Header: map[Content-Type:[text/plain; charset=utf-8] X-Content-Type-Options:[nosniff]]
I0101 22:27:00.712591 1 controller.go:127] OpenAPI AggregationController: action for item v1beta1.metrics.k8s.io: Rate Limited Requeue.
E0101 22:27:04.712991 1 available_controller.go:419] v1beta1.metrics.k8s.io failed with: failing or missing response from https://10.96.71.138:443/apis/metrics.k8s.io/v1beta1: Get https://10.96.71.138:443/apis/metrics.k8s.io/v1beta1: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
E0101 22:27:09.714801 1 available_controller.go:419] v1beta1.metrics.k8s.io failed with: failing or missing response from https://10.96.71.138:443/apis/metrics.k8s.io/v1beta1: Get https://10.96.71.138:443/apis/metrics.k8s.io/v1beta1: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
E0101 22:27:34.709557 1 available_controller.go:419] v1beta1.metrics.k8s.io failed with: failing or missing response from https://10.96.71.138:443/apis/metrics.k8s.io/v1beta1: Get https://10.96.71.138:443/apis/metrics.k8s.io/v1beta1: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
E0101 22:27:39.714173 1 available_controller.go:419] v1beta1.metrics.k8s.io failed with: failing or missing response from https://10.96.71.138:443/apis/metrics.k8s.io/v1beta1: Get https://10.96.71.138:443/apis/metrics.k8s.io/v1beta1: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
我试图弄清楚发生了什么kube-apiserver
,最后可以确认问题。> 60s后我得到延迟响应(不幸time
的是没有安装)
$ kubectl exec -it kube-apiserver-k8s-master-01 -n kube-system -- /bin/sh
# echo -e "GET /apis/metrics.k8s.io/v1beta1 HTTP/1.1\r\nHost:v1beta1.metrics.k8s.io\r\n" | openssl s_client -connect 10.96.71.138:443 -quiet
Can't use SSL_get_servername
depth=1 CN = localhost-ca@1577481905
verify error:num=19:self signed certificate in certificate chain
verify return:1
depth=1 CN = localhost-ca@1577481905
verify return:1
depth=0 CN = localhost@1577481906
verify return:1
HTTP/1.1 400 Bad Request
Content-Type: text/plain; charset=utf-8
Connection: close
相同的命令从我自己的两个测试 pod(分别来自两个不同的工作节点)成功。因此,可以从工作节点上的 pod 网络访问服务 IP:
$ kubectl exec -it phpdemo-55858f97c4-fjc6q -- /bin/sh
/usr/local/bin # echo -e "GET /apis/metrics.k8s.io/v1beta1 HTTP/1.1\r\nHost:v1beta1.metrics.k8s.io\r\n" | openssl s_client -connect 10.96.71.138:443 -quiet
Can't use SSL_get_servername
depth=1 CN = localhost-ca@1577481905
verify error:num=19:self signed certificate in certificate chain
verify return:1
depth=1 CN = localhost-ca@1577481905
verify return:1
depth=0 CN = localhost@1577481906
verify return:1
HTTP/1.1 403 Forbidden
Content-Type: application/json
X-Content-Type-Options: nosniff
Date: Wed, 01 Jan 2020 22:53:44 GMT
Content-Length: 212
{"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"forbidden: User \"system:anonymous\" cannot get path \"/apis/metrics.k8s.io/v1beta1\"","reason":"Forbidden","details":{},"code":403}
以及从工作节点:
[root@k8s-worker-02 ~ ] time curl -k https://10.96.71.138:443/apis/metrics.k8s.io/v1beta1
{
"kind": "Status",
"apiVersion": "v1",
"metadata": {
},
"status": "Failure",
"message": "forbidden: User \"system:anonymous\" cannot get path \"/apis/metrics.k8s.io/v1beta1\"",
"reason": "Forbidden",
"details": {
},
"code": 403
}
real 0m0.146s
user 0m0.048s
sys 0m0.089s
这在我的主节点上不起作用。我在 60 秒后收到延迟响应
[root@k8s-master-01 ~ ] time curl -k https://10.96.71.138:443/apis/metrics.k8s.io/v1beta1
{
"kind": "Status",
"apiVersion": "v1",
"metadata": {
},
"status": "Failure",
"message": "forbidden: User \"system:anonymous\" cannot get path \"/apis/metrics.k8s.io/v1beta1\"",
"reason": "Forbidden",
"details": {
},
"code": 403
}
real 1m3.248s
user 0m0.061s
sys 0m0.079s
从主节点我可以看到很多未回复的 SYN_SENT 数据包。
[root@k8s-master-01 ~ ] conntrack -L -d 10.96.71.138
tcp 6 75 SYN_SENT src=10.0.2.15 dst=10.96.71.138 sport=48550 dport=443 [UNREPLIED] src=10.244.2.38 dst=10.244.0.0 sport=4443 dport=19813 mark=0 use=1
tcp 6 5 SYN_SENT src=10.0.2.15 dst=10.96.71.138 sport=48287 dport=443 [UNREPLIED] src=10.244.2.38 dst=10.244.0.0 sport=4443 dport=23710 mark=0 use=1
tcp 6 40 SYN_SENT src=10.0.2.15 dst=10.96.71.138 sport=48422 dport=443 [UNREPLIED] src=10.244.2.38 dst=10.244.0.0 sport=4443 dport=24286 mark=0 use=1
tcp 6 5 SYN_SENT src=10.0.2.15 dst=10.96.71.138 sport=48286 dport=443 [UNREPLIED] src=10.244.2.38 dst=10.244.0.0 sport=4443 dport=35030 mark=0 use=1
tcp 6 80 SYN_SENT src=10.0.2.15 dst=10.96.71.138 sport=48574 dport=443 [UNREPLIED] src=10.244.2.38 dst=10.244.0.0 sport=4443 dport=40636 mark=0 use=1
tcp 6 50 SYN_SENT src=10.0.2.15 dst=10.96.71.138 sport=48464 dport=443 [UNREPLIED] src=10.244.2.38 dst=10.244.0.0 sport=4443 dport=65512 mark=0 use=1
tcp 6 5 SYN_SENT src=10.0.2.15 dst=10.96.71.138 sport=48290 dport=443 [UNREPLIED] src=10.244.2.38 dst=10.244.0.0 sport=4443 dport=47617 mark=0 use=1
iptables 设置:
[root@k8s-master-01 ~ ] iptables-save | grep 10.96.71.138
-A KUBE-SERVICES ! -s 10.244.0.0/16 -d 10.96.71.138/32 -p tcp -m comment --comment "kube-system/metrics-server: cluster IP" -m tcp --dport 443 -j KUBE-MARK-MASQ
-A KUBE-SERVICES -d 10.96.71.138/32 -p tcp -m comment --comment "kube-system/metrics-server: cluster IP" -m tcp --dport 443 -j KUBE-SVC-LC5QY66VUV2HJ6WZ
kube-proxy 在每个节点上启动并运行,没有错误。
$ kubectl get pods -A -o wide
...
kube-system kube-proxy-mrz9v 1/1 Running 10 21d 10.0.0.50 k8s-master-01 <none> <none>
kube-system kube-proxy-slt95 1/1 Running 9 21d 10.0.0.61 k8s-worker-02 <none> <none>
kube-system kube-proxy-txlrp 1/1 Running 9 21d 10.0.0.60 k8s-worker-01 <none> <none>
$ kubectl -n kube-system logs kube-proxy-mrz9v
W0101 21:31:14.268698 1 server_others.go:323] Unknown proxy mode "", assuming iptables proxy
I0101 21:31:14.283958 1 node.go:135] Successfully retrieved node IP: 10.0.0.50
I0101 21:31:14.284034 1 server_others.go:145] Using iptables Proxier.
I0101 21:31:14.284624 1 server.go:571] Version: v1.17.0
I0101 21:31:14.286031 1 conntrack.go:100] Set sysctl 'net/netfilter/nf_conntrack_max' to 131072
I0101 21:31:14.286093 1 conntrack.go:52] Setting nf_conntrack_max to 131072
I0101 21:31:14.287207 1 conntrack.go:83] Setting conntrack hashsize to 32768
I0101 21:31:14.298760 1 conntrack.go:100] Set sysctl 'net/netfilter/nf_conntrack_tcp_timeout_established' to 86400
I0101 21:31:14.298984 1 conntrack.go:100] Set sysctl 'net/netfilter/nf_conntrack_tcp_timeout_close_wait' to 3600
I0101 21:31:14.300618 1 config.go:313] Starting service config controller
I0101 21:31:14.300665 1 shared_informer.go:197] Waiting for caches to sync for service config
I0101 21:31:14.300720 1 config.go:131] Starting endpoints config controller
I0101 21:31:14.300740 1 shared_informer.go:197] Waiting for caches to sync for endpoints config
I0101 21:31:14.400864 1 shared_informer.go:204] Caches are synced for service config
I0101 21:31:14.401021 1 shared_informer.go:204] Caches are synced for endpoints config
> kubectl -n kube-system logs kube-proxy-slt95
W0101 21:31:13.856897 1 server_others.go:323] Unknown proxy mode "", assuming iptables proxy
I0101 21:31:13.905653 1 node.go:135] Successfully retrieved node IP: 10.0.0.61
I0101 21:31:13.905704 1 server_others.go:145] Using iptables Proxier.
I0101 21:31:13.906370 1 server.go:571] Version: v1.17.0
I0101 21:31:13.906983 1 conntrack.go:100] Set sysctl 'net/netfilter/nf_conntrack_max' to 131072
I0101 21:31:13.907032 1 conntrack.go:52] Setting nf_conntrack_max to 131072
I0101 21:31:13.907413 1 conntrack.go:83] Setting conntrack hashsize to 32768
I0101 21:31:13.912221 1 conntrack.go:100] Set sysctl 'net/netfilter/nf_conntrack_tcp_timeout_established' to 86400
I0101 21:31:13.912321 1 conntrack.go:100] Set sysctl 'net/netfilter/nf_conntrack_tcp_timeout_close_wait' to 3600
I0101 21:31:13.915322 1 config.go:313] Starting service config controller
I0101 21:31:13.915353 1 shared_informer.go:197] Waiting for caches to sync for service config
I0101 21:31:13.915755 1 config.go:131] Starting endpoints config controller
I0101 21:31:13.915779 1 shared_informer.go:197] Waiting for caches to sync for endpoints config
I0101 21:31:14.016995 1 shared_informer.go:204] Caches are synced for endpoints config
I0101 21:31:14.017115 1 shared_informer.go:204] Caches are synced for service config
> kubectl -n kube-system logs kube-proxy-txlrp
W0101 21:31:13.552518 1 server_others.go:323] Unknown proxy mode "", assuming iptables proxy
I0101 21:31:13.696793 1 node.go:135] Successfully retrieved node IP: 10.0.0.60
I0101 21:31:13.696846 1 server_others.go:145] Using iptables Proxier.
I0101 21:31:13.697396 1 server.go:571] Version: v1.17.0
I0101 21:31:13.698000 1 conntrack.go:100] Set sysctl 'net/netfilter/nf_conntrack_max' to 131072
I0101 21:31:13.698101 1 conntrack.go:52] Setting nf_conntrack_max to 131072
I0101 21:31:13.698509 1 conntrack.go:83] Setting conntrack hashsize to 32768
I0101 21:31:13.704280 1 conntrack.go:100] Set sysctl 'net/netfilter/nf_conntrack_tcp_timeout_established' to 86400
I0101 21:31:13.704467 1 conntrack.go:100] Set sysctl 'net/netfilter/nf_conntrack_tcp_timeout_close_wait' to 3600
I0101 21:31:13.704888 1 config.go:131] Starting endpoints config controller
I0101 21:31:13.704935 1 shared_informer.go:197] Waiting for caches to sync for endpoints config
I0101 21:31:13.705046 1 config.go:313] Starting service config controller
I0101 21:31:13.705059 1 shared_informer.go:197] Waiting for caches to sync for service config
I0101 21:31:13.806299 1 shared_informer.go:204] Caches are synced for endpoints config
I0101 21:31:13.806430 1 shared_informer.go:204] Caches are synced for service config
这是我的(默认)kube-proxy
设置:
$ kubectl -n kube-system get configmap kube-proxy -o yaml
apiVersion: v1
data:
config.conf: |-
apiVersion: kubeproxy.config.k8s.io/v1alpha1
bindAddress: 0.0.0.0
clientConnection:
acceptContentTypes: ""
burst: 10
contentType: application/vnd.kubernetes.protobuf
kubeconfig: /var/lib/kube-proxy/kubeconfig.conf
qps: 5
clusterCIDR: 10.244.0.0/16
configSyncPeriod: 15m0s
conntrack:
maxPerCore: 32768
min: 131072
tcpCloseWaitTimeout: 1h0m0s
tcpEstablishedTimeout: 24h0m0s
enableProfiling: false
healthzBindAddress: 0.0.0.0:10256
hostnameOverride: ""
iptables:
masqueradeAll: false
masqueradeBit: 14
minSyncPeriod: 0s
syncPeriod: 30s
ipvs:
excludeCIDRs: null
minSyncPeriod: 0s
scheduler: ""
strictARP: false
syncPeriod: 30s
kind: KubeProxyConfiguration
metricsBindAddress: 127.0.0.1:10249
mode: ""
nodePortAddresses: null
oomScoreAdj: -999
portRange: ""
udpIdleTimeout: 250ms
winkernel:
enableDSR: false
networkName: ""
sourceVip: ""
kubeconfig.conf: |-
apiVersion: v1
kind: Config
clusters:
- cluster:
certificate-authority: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
server: https://10.0.0.50:6443
name: default
contexts:
- context:
cluster: default
namespace: default
user: default
name: default
current-context: default
users:
- name: default
user:
tokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
kind: ConfigMap
metadata:
creationTimestamp: "2019-12-06T22:07:40Z"
labels:
app: kube-proxy
name: kube-proxy
namespace: kube-system
resourceVersion: "185"
selfLink: /api/v1/namespaces/kube-system/configmaps/kube-proxy
uid: bac4a8df-e318-4c91-a6ed-9305e58ac6d9
$ kubectl -n kube-system get daemonset kube-proxy -o yaml
apiVersion: apps/v1
kind: DaemonSet
metadata:
annotations:
deprecated.daemonset.template.generation: "2"
creationTimestamp: "2019-12-06T22:07:40Z"
generation: 2
labels:
k8s-app: kube-proxy
name: kube-proxy
namespace: kube-system
resourceVersion: "115436"
selfLink: /apis/apps/v1/namespaces/kube-system/daemonsets/kube-proxy
uid: 64a53d29-1eaa-424f-9ebd-606bcdb3169c
spec:
revisionHistoryLimit: 10
selector:
matchLabels:
k8s-app: kube-proxy
template:
metadata:
creationTimestamp: null
labels:
k8s-app: kube-proxy
spec:
containers:
- command:
- /usr/local/bin/kube-proxy
- --config=/var/lib/kube-proxy/config.conf
- --hostname-override=$(NODE_NAME)
env:
- name: NODE_NAME
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: spec.nodeName
image: k8s.gcr.io/kube-proxy:v1.17.0
imagePullPolicy: IfNotPresent
name: kube-proxy
resources: {}
securityContext:
privileged: true
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /var/lib/kube-proxy
name: kube-proxy
- mountPath: /run/xtables.lock
name: xtables-lock
- mountPath: /lib/modules
name: lib-modules
readOnly: true
dnsPolicy: ClusterFirst
hostNetwork: true
nodeSelector:
beta.kubernetes.io/os: linux
priorityClassName: system-node-critical
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
serviceAccount: kube-proxy
serviceAccountName: kube-proxy
terminationGracePeriodSeconds: 30
tolerations:
- key: CriticalAddonsOnly
operator: Exists
- operator: Exists
volumes:
- configMap:
defaultMode: 420
name: kube-proxy
name: kube-proxy
- hostPath:
path: /run/xtables.lock
type: FileOrCreate
name: xtables-lock
- hostPath:
path: /lib/modules
type: ""
name: lib-modules
updateStrategy:
rollingUpdate:
maxUnavailable: 1
type: RollingUpdate
status:
currentNumberScheduled: 3
desiredNumberScheduled: 3
numberAvailable: 3
numberMisscheduled: 0
numberReady: 3
observedGeneration: 2
updatedNumberScheduled: 3
这只是配置错误的结果还是一个错误?任何帮助表示赞赏。
解决方案
这是我为使其工作所做的工作:
1.在kube - --enable-aggregator-routing=true
API Server中设置flag。
2.在metrics-server-deployment.yaml中设置以下标志
- --kubelet-insecure-tls
- --kubelet-preferred-address-types=InternalIP
3.hostNetwork: true
在metrics-server-deployment.yaml中设置
推荐阅读
- apache-kafka - Ksql, GROUP BY 返回 ServerError:java.lang.NullPointerException
- cordova-plugins - 无法在 Visual Studio 2017 中安装 Apache Cordova 插件
- python - 宏程序不工作
- amazon-web-services - AWS Cognito /aws-amplify 错误 - 加密未定义
- vba - VBA在内容中查找word文档和指定的单词,然后在excel中列出
- ruby-on-rails - 为什么我需要将 public 添加到此类方法中?
- mysql - MySQL 如何解释 VARCHAR 字段大小?
- python - 从 for 循环返回的元素追加一个新列表
- html - FlexBox - 绝对位置表现得像固定的
- angular - ionic 3 typescript error 'ion-footer' 不是已知元素