首页 > 技术文章 > docker1.12在cento7里的组建swarm (一)

sfissw 2017-03-08 13:55 原文

docker1.12在cento7里的跨多主机容器网络方案

我的虚拟机是192.168.2.108-116 9台

200是仓库机

在仓库机上执行

docker swarm init 初始化swarm

[root@localhost ~]# docker swarm init
Swarm initialized: current node (ado6uyaldy5ovvi7fwkvvuoh4) is now a manager.

To add a worker to this swarm, run the following command:

docker swarm join \
--token SWMTKN-1-5i9rc8jlypt8ngy137asbi5qhwenuze9ez1o19f40jxftnq4nj-2mnw1bzecz6tqz94bia3ok5rp \
192.168.2.200:2377

To add a manager to this swarm, run 'docker swarm join-token manager' and follow the instructions.

[root@localhost ~]# netstat -lan|grep 2377
tcp 0 0 192.168.2.200:49598 192.168.2.200:2377 ESTABLISHED
tcp 0 0 127.0.0.1:42714 127.0.0.1:2377 ESTABLISHED
tcp6 0 0 :::2377 :::* LISTEN
tcp6 0 0 127.0.0.1:2377 127.0.0.1:42714 ESTABLISHED
tcp6 0 0 192.168.2.200:2377 192.168.2.200:49598 ESTABLISHED
[root@localhost ~]# docker node ls
ID HOSTNAME STATUS AVAILABILITY MANAGER STATUS
ado6uyaldy5ovvi7fwkvvuoh4 * localhost.localdomain Ready Active Leader

仓库机的基础swarm集群 算是创建出来了

在108上面执行

docker swarm join \
--token SWMTKN-1-5i9rc8jlypt8ngy137asbi5qhwenuze9ez1o19f40jxftnq4nj-2mnw1bzecz6tqz94bia3ok5rp \
192.168.2.200:2377

过很久报错:

Error response from daemon: Timeout was reached before node was joined. The attempt to join the swarm will continue in the background. Use the "docker info" command to see the current swarm status of your node.

怀疑是防火墙问题;

200:

firewall-cmd --permanent --zone=public --add-port=2377/tcp

firewall-cmd --reload

怀疑是时间问题:

每个节点上安装ntp网络时间同步服务:

yum -y install ntp

systemctl enable ntpd

systemctl start ntpd

ntpdate -u cn.pool.ntp.org

怀疑是主机名问题:

hostnamectl set-hostname ip+ip尾段.sfimc.com

108上:

因为之前 join过 再join 会报错

 docker swarm leave  注意在运行时的节点上这句属于危险语句 要小心

[root@localhost ~]# docker swarm leave
Node left the swarm.
[root@localhost ~]# docker swarm join --token SWMTKN-1-5i9rc8jlypt8ngy137asbi5qhwenuze9ez1o19f40jxftnq4nj-2mnw1bzecz6tqz94bia3ok5rp 192.168.2.200:2377
This node joined a swarm as a worker.

恭喜!这就成功了

然后在109-115的机器上:也

 docker swarm join --token SWMTKN-1-5i9rc8jlypt8ngy137asbi5qhwenuze9ez1o19f40jxftnq4nj-2mnw1bzecz6tqz94bia3ok5rp 192.168.2.200:2377

也就是将其他worker节点加入集群  应该可以都成功

然后 在200上

docker swarm join-token manager  这是用来取得 join管理节点的  token的   是的 只是token不一样 我开始也找了很久 英文太差了。。。哈哈哈

docker swarm join --token SWMTKN-1-5i9rc8jlypt8ngy137asbi5qhwenuze9ez1o19f40jxftnq4nj-3uyi9txwapgnfe2gxbs6nkv6f 192.168.2.200:2377

在116执行:

docker swarm join --token SWMTKN-1-5i9rc8jlypt8ngy137asbi5qhwenuze9ez1o19f40jxftnq4nj-3uyi9txwapgnfe2gxbs6nkv6f 192.168.2.200:2377

This node joined a swarm as a manager.

这时候在200上看节点:

[root@ip200 ~]# docker node ls
ID HOSTNAME STATUS AVAILABILITY MANAGER STATUS
4zgqf8j8q9zi5nw2czdx59a7g ip108.sfimc.com Ready Active
ado6uyaldy5ovvi7fwkvvuoh4 * ip200.sfimc.com Ready Active Leader
e55zsu88tf1ggj824ezutmjww ip116.sfimc.com Ready Active Reachable

 

116成了可选节点 当主节点异常了  或者关机了 他就会切换成主节点

到这里其实整个swarm集群节点就已经建成了

这里会有个严重问题  一个管理节点的退出  

开始在116上

[root@ip116 ~]# docker swarm leave
Error response from daemon: You are attempting to leave the swarm on a node that is participating as a manager. Removing this node leaves 1 managers out of 2. Without a Raft quorum your swarm will be inaccessible. The only way to restore a swarm that has lost consensus is to reinitialize it with `--force-new-cluster`. Use `--force` to suppress this message.
[root@ip116 ~]# docker swarm leave
Error response from daemon: You are attempting to leave the swarm on a node that is participating as a manager. Removing this node leaves 1 managers out of 2. Without a Raft quorum your swarm will be inaccessible. The only way to restore a swarm that has lost consensus is to reinitialize it with `--force-new-cluster`. Use `--force` to suppress this message.
[root@ip116 ~]# docker swarm leave --force
Node left the swarm.
[root@ip116 ~]# ^C
[root@ip116 ~]# docker swarm leave --force
Error response from daemon: This node is not part of a swarm

我试图从116这个管理节点上上退出集群 换到新造的201 202号上   英文不好的我在116上试了一个命令参数  force  结果造成了

200上:

[root@ip200 ~]# docker node ls
Error response from daemon: rpc error: code = 2 desc = raft: no elected cluster leader

[root@ip200 ~]# docker node update e55(注e55是原来116在集群里的字符串代码的头三位)
Error response from daemon: rpc error: code = 4 desc = context deadline exceeded

 

网上查了暂时无解

只好在200上也:docker swarm leave --force(注意这就意味着整个集群没有一个管理节点 实际上集群就已经崩溃了)

重新配置:

200上:

[root@ip200 ~]# docker swarm init
Swarm initialized: current node (c0fqga97cqoghgn5h8rqn39yc) is now a manager.

To add a worker to this swarm, run the following command:

docker swarm join \
--token SWMTKN-1-2djciyagpvqpvs0r770pu5fgcith6yc1uhsev2g1e0riprt1qy-3789ph5zmo6tx703c77zno6kf \
192.168.2.200:2377

To add a manager to this swarm, run 'docker swarm join-token manager' and follow the instructions.

[root@ip200 ~]# docker swarm join-token manager
To add a manager to this swarm, run the following command:

docker swarm join \
--token SWMTKN-1-2djciyagpvqpvs0r770pu5fgcith6yc1uhsev2g1e0riprt1qy-aookkrj5i8ggkfn8r2er5hrt3 \
192.168.2.200:2377

再次取得新的两种节点的join token

在201 202上运行管理节点的join命令  在110-116上加入成为worker的join命令,有些已经加入过之前集群的要先docker swarm leave才行,管理节点要开放2337端口,这些上文都有示例,就不重复絮叨了;

在处理202的时候有点异常,加入集群以后 202控制台没有任何显示,在200 201上显示成了工作节点,我把202关机  这时候node ls 202显示down,在200上docker node rm 202  成功在集群里删了他,

但是试图重新加入节点的时候202异常,似乎还是认为自己属于某个集群的管理节点,

docker swarm leave --force
Error response from daemon: context deadline exceeded

连强制leave都报这个异常,

[root@ip202 ~]# cd /var/lib/docker/swarm/
[root@ip202 swarm]# ls
certificates docker-state.json raft state.json worker
[root@ip202 swarm]# rm -rf *

重启docker服务

[root@ip202 swarm]# service docker restart
Redirecting to /bin/systemctl restart docker.service
[root@ip202 swarm]# docker swarm leave
Error response from daemon: This node is not part of a swarm

 

 systemctl stop  docker.service;cd /var/lib/docker/swarm/;rm -rf *; service docker restart;docker swarm leave --force;

Redirecting to /bin/systemctl restart docker.service
Error response from daemon: This node is not part of a swarm

总算正常了,看来终极大招就是重启服务和删除swarm相关文件 不过是万不得已的时候才能做就是了

再次加入202到集群作为管理节点,

最后在200、201、202上随遍哪个执行;

docker node ls

[root@ip200 swarm]# docker node ls
ID HOSTNAME STATUS AVAILABILITY MANAGER STATUS
08kaohdh5jl17toyiqe08rxuv ip110.sfimc.com Ready Active
20tv0r7vw3xtxyez8mx7o5g5d * ip200.sfimc.com Ready Active Leader
2hslpef7idwka474cok6kach8 ip111.sfimc.com Ready Active
2rkf3xqjpq89tjoqj073hht14 ip114.sfimc.com Ready Active
52ued4l33njka5ua4wjoi8epk ip112.sfimc.com Ready Active
5kclmdzx3mxafqi3achr3h47a ip202.sfimc.com Ready Active Reachable
5osxnk5asjxi3d3vzdsy9lvbv ip201.sfimc.com Ready Active Reachable
6qrxcpsiqpmec4regf5cmyszc ip116.sfimc.com Ready Active
ai77no9444jka1t8srjwk8bzk ip108.sfimc.com Ready Active
bduvkn5xczeo9ax2ydyvzmvbo ip115.sfimc.com Ready Active
da0e74wzdxgi83f7m89r51jil ip113.sfimc.com Ready Active
dlfy5dwho3b6k0db3q9za5pov ip109.sfimc.com Ready Active

 

终于  都加入了

现在随便在哪个管理节点进行管理操作了

推荐阅读