docker - Kubernetes - NodePort 服务只能在部署 pod 的节点上访问
问题描述
我已经建立了一个基于三个虚拟机 Centos 8 的 kubernetes 集群,并使用 nginx 部署了一个 pod。
虚拟机的 IP 地址:
kubemaster 192.168.56.20
kubenode1 192.168.56.21
kubenode2 192.168.56.22
在每个 VM 上,接口和路由定义如下:
ip addr:
lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
2: enp0s3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
link/ether 08:00:27:d2:1b:97 brd ff:ff:ff:ff:ff:ff
inet 10.0.2.15/24 brd 10.0.2.255 scope global dynamic noprefixroute enp0s3
valid_lft 75806sec preferred_lft 75806sec
3: enp0s8: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
link/ether 08:00:27:df:77:05 brd ff:ff:ff:ff:ff:ff
inet 192.168.56.22/24 brd 192.168.56.255 scope global noprefixroute enp0s8
valid_lft forever preferred_lft forever
4: virbr0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default qlen 1000
link/ether 52:54:00:ff:47:9a brd ff:ff:ff:ff:ff:ff
inet 192.168.122.1/24 brd 192.168.122.255 scope global virbr0
valid_lft forever preferred_lft forever
5: virbr0-nic: <BROADCAST,MULTICAST> mtu 1500 qdisc fq_codel master virbr0 state DOWN group default qlen 1000
link/ether 52:54:00:ff:47:9a brd ff:ff:ff:ff:ff:ff
6: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default
link/ether 02:42:19:52:19:b1 brd ff:ff:ff:ff:ff:ff
inet 172.17.0.1/16 brd 172.17.255.255 scope global docker0
valid_lft forever preferred_lft forever
7: flannel.1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UNKNOWN group default
link/ether 22:b8:b4:5a:5a:26 brd ff:ff:ff:ff:ff:ff
inet 10.244.2.0/32 brd 10.244.2.0 scope global flannel.1
valid_lft forever preferred_lft forever
ip route:
default via 10.0.2.2 dev enp0s3 proto dhcp metric 100
default via 192.168.56.1 dev enp0s8 proto static metric 101
10.0.2.0/24 dev enp0s3 proto kernel scope link src 10.0.2.15 metric 100
10.244.0.0/24 via 10.244.0.0 dev flannel.1 onlink
10.244.1.0/24 via 10.244.1.0 dev flannel.1 onlink
172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1 linkdown
192.168.56.0/24 dev enp0s8 proto kernel scope link src 192.168.56.22 metric 101
192.168.122.0/24 dev virbr0 proto kernel scope link src 192.168.122.1 linkdown
在每个 VM 上,我有两个网络适配器,一个用于 Internet 访问的 NAT (enp0s3) 和一个用于 3 个 VM 相互通信的仅主机网络 (enp0s8)(没关系,我使用 ping 命令对其进行了测试)。
在每个 VM 上,我应用了以下防火墙规则:
firewall-cmd --permanent --add-port=6443/tcp # Kubernetes API server
firewall-cmd --permanent --add-port=2379-2380/tcp # etcd server client API
firewall-cmd --permanent --add-port=10250/tcp # Kubelet API
firewall-cmd --permanent --add-port=10251/tcp # kube-scheduler
firewall-cmd --permanent --add-port=10252/tcp # kube-controller-manager
firewall-cmd --permanent --add-port=8285/udp # Flannel
firewall-cmd --permanent --add-port=8472/udp # Flannel
firewall-cmd --add-masquerade –permanent
firewall-cmd --reload
最后,我使用以下命令部署了集群和 nginx:
sudo kubeadm init --apiserver-advertise-address=192.168.56.20 --pod-network-cidr=10.244.0.0/16 (for Flannel CNI)
kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml
kubectl create deployment nginx --image=nginx
kubectl create service nodeport nginx --tcp=80:80
我的集群的更多一般信息:
kubectl 获取节点 -o 宽
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
kubemaster Ready master 3h8m v1.19.2 192.168.56.20 <none> CentOS Linux 8 (Core) 4.18.0-193.19.1.el8_2.x86_64 docker://19.3.13
kubenode1 Ready <none> 3h6m v1.19.2 192.168.56.21 <none> CentOS Linux 8 (Core) 4.18.0-193.19.1.el8_2.x86_64 docker://19.3.13
kubenode2 Ready <none> 165m v1.19.2 192.168.56.22 <none> CentOS Linux 8 (Core) 4.18.0-193.19.1.el8_2.x86_64 docker://19.3.13
kubectl 获取 pods --all-namespaces -o wide
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
default nginx-6799fc88d8-mrvsg 1/1 Running 0 3h 10.244.1.3 kubenode1 <none> <none>
kube-system coredns-f9fd979d6-6qxk9 1/1 Running 0 3h9m 10.244.1.2 kubenode1 <none> <none>
kube-system coredns-f9fd979d6-bj2fd 1/1 Running 0 3h9m 10.244.0.2 kubemaster <none> <none>
kube-system etcd-kubemaster 1/1 Running 0 3h9m 192.168.56.20 kubemaster <none> <none>
kube-system kube-apiserver-kubemaster 1/1 Running 0 3h9m 192.168.56.20 kubemaster <none> <none>
kube-system kube-controller-manager-kubemaster 1/1 Running 0 3h9m 192.168.56.20 kubemaster <none> <none>
kube-system kube-flannel-ds-fdv4p 1/1 Running 0 166m 192.168.56.22 kubenode2 <none> <none>
kube-system kube-flannel-ds-vvhsz 1/1 Running 0 3h6m 192.168.56.21 kubenode1 <none> <none>
kube-system kube-flannel-ds-vznl5 1/1 Running 0 3h6m 192.168.56.20 kubemaster <none> <none>
kube-system kube-proxy-45tmz 1/1 Running 0 3h9m 192.168.56.20 kubemaster <none> <none>
kube-system kube-proxy-nb7jt 1/1 Running 0 3h7m 192.168.56.21 kubenode1 <none> <none>
kube-system kube-proxy-tl9n5 1/1 Running 0 166m 192.168.56.22 kubenode2 <none> <none>
kube-system kube-scheduler-kubemaster 1/1 Running 0 3h9m 192.168.56.20 kubemaster <none> <none>
kubectl 获取服务 -o 宽
kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 3h10m <none>
nginx NodePort 10.102.152.25 <none> 80:30086/TCP 179m app=nginx
Kubernetes 版本:
Client Version: version.Info{Major:"1", Minor:"19", GitVersion:"v1.19.2", GitCommit:"f5743093fd1c663cb0cbc89748f730662345d44d", GitTreeState:"clean", BuildDate:"2020-09-16T13:41:02Z", GoVersion:"go1.15", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"19", GitVersion:"v1.19.2", GitCommit:"f5743093fd1c663cb0cbc89748f730662345d44d", GitTreeState:"clean", BuildDate:"2020-09-16T13:32:58Z", GoVersion:"go1.15", Compiler:"gc", Platform:"linux/amd64"}
iptables 版本:
iptables v1.8.4 (nf_tables)
结果和问题:
- 如果我从任何虚拟机执行 curl 192.168.56.21:30086 -> 好的,我会得到 nginx 代码。
- 如果我尝试其他 IP(例如 curl 192.168.56.22:30086),它会失败......(curl:(7)无法连接到 192.168.56.22 端口 30086:连接超时)
我试图调试的内容:
sudo netstat -antup | grep kube-proxy
o tcp 0 0 0.0.0.0:30086 0.0.0.0:* LISTEN 4116/kube-proxy
o tcp 0 0 127.0.0.1:10249 0.0.0.0:* LISTEN 4116/kube-proxy
o tcp 0 0 192.168.56.20:49812 192.168.56.20:6443 ESTABLISHED 4116/kube-proxy
o tcp6 0 0 :::10256 :::* LISTEN 4116/kube-proxy
因此,在每个 VM 上,kube-proxy 似乎都在侦听端口 30086,这没问题。
我尝试在每个节点上应用此规则(在另一个票证上找到)但没有成功:
iptables -A FORWARD -j ACCEPT
您知道为什么我无法从主节点和节点 2 访问服务吗?
第一次更新:
- 似乎 Centos 8 与 kubeadm 不兼容。我换了 Centos 7 但仍然有问题;
- 创建的 flannel pod 使用了错误的接口 (enp0s3) 而不是 enp0s8。我修改了 kube-flannel.yaml 文件并添加了参数(--iface=enp0s8)。现在我的 pod 正在使用正确的界面。
kubectl logs kube-flannel-ds-nn6v4 -n kube-system:
I0929 06:19:36.842149 1 main.go:531] Using interface with name enp0s8 and address 192.168.56.22
I0929 06:19:36.842243 1 main.go:548] Defaulting external address to interface address (192.168.56.22)
即使通过修复这两件事我仍然有同样的问题......
第二次更新:
最终的解决方案是使用以下命令刷新每个 VM 上的 iptables:
systemctl stop kubelet
systemctl stop docker
iptables --flush
iptables -tnat --flush
systemctl start kubelet
systemctl start docker
现在它工作正常:)
解决方案
I finally found the solution after having switched to Centos 7 and correct Flannel configuration (see other comments). Actually, I noticed some issues in the pods where coredns is running. Here is an example of what happens inside one of these pods:
kubectl logs coredns-f9fd979d6-8gtlp -n kube-system:
E0929 07:09:40.200413 1 reflector.go:178] pkg/mod/k8s.io/client-go@v0.18.3/tools/cache/reflector.go:125: Failed to list *v1.Endpoints: Get "https://10.96.0.1:443/api/v1/endpoints?limit=500&resourceVersion=0": dial tcp 10.96.0.1:443: connect: no route to host
[INFO] plugin/ready: Still waiting on: "kubernetes"
The final solution was to flush iptables on each VM with the following commands:
systemctl stop kubelet
systemctl stop docker
iptables --flush
iptables -tnat --flush
systemctl start kubelet
systemctl start docker
Then I can access the service deployed from each VM :)
I am still not sure to understand clearly what was the issue. Here is some information:
- https://github.com/kubernetes/kubeadm/issues/193
- https://www.developertyrone.com/blog/kubernetes-administrator-notes-coredns-fix-on-centos-no-route-to-host-networking-issues/
I will keep investigating and post more information here.
推荐阅读
- javascript - 反应 setTimeout - 内存泄漏
- c# - 如何中断从 c# interop 调用的 c++ 代码
- python - 求嵌套循环的时间复杂度
- python-3.x - python中的zip功能表现得很奇怪
- c# - 如何使用配置注册 Autofac 模块
- python - 通过python生成OTP
- ios - 为 iOS 构建分发时,Appcelerator 中的存档失败
- twitter-bootstrap - 使用相同的模态制作不同的链接,每个链接显示不同的轮播
- javascript - 将数组拆分为总和等于特定目标的数字数组
- python - 在 Windows 上通过 pip 为 Python 3.9 安装 pyodbc 时出错