首页 > 技术文章 > 专题:『Channel Bonding/team』——EXPERIMANTAL!!!

hadex 2015-09-06 14:03 原文

Linux内核支持的多网卡聚合方法——bond、team

bond

  • 优点:经过长时间的实践检验,具有较高的稳定性;kernel-2.4及以上内核均广泛支持
  • 缺点:需要通过sysfs或发行版定制的网卡配置文件控制,易用性较差;运行效率对比team没有优势;同一时间仅允许单一形式的monitor,不能结合使用arp及mii

team

  • 优点:后起之秀,运行效率较bond有所提升,目前在5%左右;提供用户空间程序库(libteam)和配置文件(teamd.conf),易用性提升;提供了mode API,用户可以编写自己的mode;同一时间可以结合使用多种monitor方式,有助于实现HA
  • 缺点:仅能应用在配置kernel-3.3及更高版本内核的Linux系统中,诸如Debian7(kernel-3.2)、rhel6(kernel-2.6)等较早的发行版本,均无法应用 

- Team布署应用 -

『参考资料』https://github.com/jpirko/libteam/wiki/Infrastructure-Specification


一、确保libteam安装就绪,最新版本如下

$ git clone git://github.com/jpirko/libteam.git

二、创建team设备、绑定或解除slave端口、查看team与slave对应关系

  通常结合"ip link"、"teamd"布署应用

  • 创建team
  •   ip link add dev team0 type team
  • 删除team
  •   ip link del team0
  • 绑定eth0到team
  •   ip link set eth0 master team0
  • 解除eth0的绑定
  •   ip link set eth0 nomaster
  • 查看team设备所有的slave端口
  •   ip link | grep -P 'master\s*team0'
  • - OR -
  •   teamnl team0 ports
  • 查看team设备详细信息
  •   teamdctl teamd state -v
  • 运行teamd
  •   teamd -f[--config-file] FILENAME -d[--daemonize]

注意:同bond一样,team的slave在实施绑定之前,需要事先清空其IP及route信息并处于DOWN状态

三、模式及参数配置

  目前官方提供5种Runner(运行策略/mode)

  当每个端口单独对接一个独立的switch时,不需要对switch进行设置;当team设备有多于一个slave interface接入同一个switch时,除activebackup外,其它策略均需要switch端支技EtherChannel,lacp策略额外需要802.3ad支持


The following runners are available:

  • broadcast — Simple runner which directs the team device to transmit packets via all ports
  • roundrobin — Simple runner which directs the team device to transmits packets in a round-robin fashion
  • activebackup — Watches for link changes and selects active port to be used for data transfers
  • loadbalance — To do passive load balancing, runner only sets up BPF hash function which will determine port for packet transmit.To do active load balancing, runner moves hashes among available ports trying to reach perfect balance
  • lacp — Implements 802.3ad LACP protocol. Can use same Tx port selection possibilities as loadbalance runner

  Monitor mode

  • ethtool - Uses Libteam lib to get port ethtool state changes
  • arp_ping - ARP requests are sent through a port. If an ARP reply is received, the link is considered to be up. Target IP address, interval and other options can be setup in configfile
  • nsna_ping - Similar to the arp_ping, only it uses the IPv6 Neighbour Solicitation and Neighbour Advertisement mechanism. This is an alternative to arp_ping and becomes handy in pure-IPv6 environments

   TeamX.conf使用JSON风格的配置文件

JSON语法梗概:

  成对的‘{}’标识代码层级,子项是由‘:’连接的‘key:value’对,子项之间用‘,’分割,除纯数字以外,内容要写在‘“”’中 

『COMMON OPTIONS』 


  • “runner”: {"name": "activebackup/lacp/broadcast/boundrobin/loadbalance"}
  • "device": "Desired name of new team device"
  • "hwaddr": "Desired hardware address of new team device. Usual MAC address format is accepted"
  • "link_watch": {"name": "ethtool/nasa_ping/arp_ping"}
  • -OR-#设置指定端口的monitor,后续 SPECIFIC OPTIONS 中的monitor同样适用
  • "ports": {
         "eth0": {
              "link_watch": {"name": "ethtool/arp_ping/nsna_ping"}
         }
     }

『SPECIFIC OPTIONS』——ACTIVE-BACKUP RUNNER 


  • "ports": {"eth0": {"prio": -10, "sticky": "true"}}
    • Port priority. The higher number means higher priority.Default: 0
    • Flag which indicates if the port is sticky. If set, it means the port does not get unselected if another port with higher priority or better parameters becomes available.Default is false

『SPECIFIC OPTIONS』——LOAD BALANCE RUNNER 


  • "runner": {"tx_hash": ["eth", "ipv4", "ipv6"]}  注意:一对多情况,采用数组格式
    • eth — Uses source and destination MAC addresses
    • vlan — Uses VLAN id
    • ipv4 — Uses source and destination IPv4 addresses
    • ipv6 — Uses source and destination IPv6 addresses
    • tcp — Uses source and destination TCP ports
    • udp — Uses source and destination UDP ports
    • sctp — Uses source and destination SCTP ports

  #List of fragment types which should be used for packet Tx hash computation

『SPECIFIC OPTIONS』——LACP RUNNER 


  •  “runner”: {"fast_rate": "true"}

  #Specifies the rate at which our link partner is asked to transmit LACPDU packets. If this is true then packets will be sent once per second. Otherwise they will be sent every 30 seconds

  • "runner": {"tx_hash": [array]}

  #Same as for load balance runner

  • "runner": {"agg_select_policy": "lacp_prio/lacp_prio_stable/bandwidth/count/port_options"}
    • lacp_prio — Aggregator with highest priority according to LACP standard will be selected. Aggregator priority is affected by per-port option lacp_prio
    • lacp_prio_stable — Same as previous one, except do not replace selected aggregator if it is still usable
    • bandwidth — Select aggregator with highest total bandwidth
    • count — Select aggregator with highest number of ports
    • port_options — Aggregator with highest priority according to per-port options prio and sticky will be selected. This means that the aggregator containing the port with the highest priority will be selected unless at least one of the ports in the currently selected aggregator is sticky.Default is lacp_prio
  • "ports": {"eth0": {"lacp_prio": -10}}

  #Port priority according to LACP standard. The lower number means higher priority

  • "ports": {"eth0": {"lacp_key": 4}}

  #Port key according to LACP standard. It is only possible to aggregate ports with the same key.Default is 0

『SPECIFIC OPTIONS』——ETHTOOL LINK WATCH 


"link_watch": {
     "name": "ethtool",
     "delay_up": 100,
     "delay_down": 50
   }

- OR -

"ports": {
     "eth0": {
          "link_watch": {
               "name": "ethtool",
               "delay_up": 200,
               "delay_down": 100
            }
       }
 }        
  • link_watch.delay_up | ports.PORTIFNAME.link_watch.delay_up (int).Value is a positive number in milliseconds. It is the delay between the link coming up and the runner being notified about it.Default is 0
  • link_watch.delay_down | ports.PORTIFNAME.link_watch.delay_down (int).Value is a positive number in milliseconds. It is the delay between the link going down and the runner being notified about it.Default is 0 

『SPECIFIC OPTIONS』——ARP PING LINK WATCH


"link_watch": {
         "name": "arp_ping",
         "interval": 100, #两次arp_ping之间的时间间隔,单位毫秒          "init_wait": 1000, #端口从初次加入team到首次发出arp_ping的时间间隔          "missed_max": 3, #允许丢失的ARP replies的最大值,超过此数值即判定为端口失效          "target_host": 10.1.0.100 #arp_ping的目标主机          }

- OR -

"ports": {
  "eth0": {     "link_watch": {
             "name": "arp_ping"
             "interval": 100,              "init_wait": 1000,              "missed_max": 3,
             "target_host": 10.1.0.100
             }
       }
     }
  • link_watch.interval | ports.PORTIFNAME.link_watch.interval (int).Value is a positive number in milliseconds. It is the interval between ARP requests being sent.
  • link_watch.init_wait | ports.PORTIFNAME.link_watch.init_wait (int).Value is a positive number in milliseconds. It is the delay between link watch initialization and the first ARP request being sent.Default is 0
  • link_watch.missed_max | ports.PORTIFNAME.link_watch.missed_max (int).Maximum number of missed ARP replies. If this number is exceeded, link is reported as down.Default is 3
  • link_watch.source_host | ports.PORTIFNAME.link_watch.source_host (hostname).Hostname to be converted to IP address which will be filled into ARP request as source address.Default is 0.0.0.0
  • link_watch.target_host | ports.PORTIFNAME.link_watch.target_host (hostname).Hostname to be converted(转换) to IP address which will be filled into ARP request as destination address.
  • link_watch.validate_active | ports.PORTIFNAME.link_watch.validate_active (bool).Validate(验证、检测) received ARP packets on active ports. If this is not set, all incoming ARP packets will be considered as a good reply.Default is false
  • link_watch.validate_inactive | ports.PORTIFNAME.link_watch.validate_inactive (bool).Validate received ARP packets on inactive ports. If this is not set, all incoming ARP packets will be considered as a good reply.Default is false
  • link_watch.send_always | ports.PORTIFNAME.link_watch.send_always (bool).By default, ARP requests are sent on active ports only. This option allows sending even on inactive ports.Default: false 

『SPECIFIC OPTIONS』——NS/NA PING LINK WATCH 


  • link_watch.interval | ports.PORTIFNAME.link_watch.interval (int).Value is a positive number in milliseconds. It is the interval between sending NS packets
  • link_watch.init_wait | ports.PORTIFNAME.link_watch.init_wait (int).Value is a positive number in milliseconds. It is the delay between link watch initialization and the first NS packet being sent
  • link_watch.missed_max | ports.PORTIFNAME.link_watch.missed_max (int).Maximum number of missed NA reply packets. If this number is exceeded, link is reported as down.Default is 3
  • link_watch.target_host | ports.PORTIFNAME.link_watch.target_host (hostname).Hostname to be converted to IPv6 address which will be filled into NS packet as target address

四、示例:启动脚本

#!/bin/env bash
team()
{
    tM="team0"
    iP="10.1.7.77/24"
    pkill teamd 2>/dev/null
    for i in {0..2}
    do
        ip addr flush dev eth$i
        ip route flush dev eth$i
        ip link set eth$i down
    done
    PS3="Select runner policy:"
    select x in "activebackup" "broadcast" "loadbalance" "lacp" "roundrobin"
    do
        teamd --force-recreate --config-file ${x}.conf --daemonize
        ip link set $tM up
        ip addr add $iP dev $tM scope link
        break
    done
}
team

推荐阅读