Host Settings
This page contains settings for Linux servers, these instructions are agnostic of the Linux
distribution used by the servers. The commands use the ip utility from the iproute2 package.
These settings do not persist across reboots. Consult the documentation from
the Linux distribution for guides on how to persist the settings used on the
server. For example, Netplan or Network Manager. For
additional details on options and behavior, consult the kernel bonding driver documentation.
Multi-homing
ESLAG (EVPN Multi-Homing) is the recommended way to provide device redundancy. A server connects to multiple switches (up to 4) without requiring the switches to be directly connected to each other. The host must configure its ports using LACP (IEEE 802.3ad).
Server Settings
The supported server-side setting for bonding is 802.3ad, with hashing policy
layer2 or layer2+3. layer2+3 transmit hashing is the recommended
setting for the overwhelming majority of scenarios.
To configure this in Linux:
sudo ip link add $bond_name type bond miimon 50 mode 802.3ad layer2+3
sudo ip link set $slave1_name master $bond_name
sudo ip link set $slave2_name master $bond_name
The miimon 100 indicates that the bonding driver will use miimon to
determine if the link is up, and it will poll every 50 milliseconds.
layer2 or layer2+3 is the hashing mode, which controls the input into the
hash function. See the kernel bonding driver documentation for
additional details.
After the bond has been created, it can receive a DHCP address or be statically assigned, for example:
To check the status of this bond:
VLANs
A Linux host is capable of receiving tagged and untagged traffic. Tagged traffic is traffic that has the VLAN (802.1q) header intact, untagged traffic has the VLAN header removed as it exits the switch.
The Open Network Fabric is capable of emitting tagged and untagged traffic. The
setting to change whether traffic is tagged or untagged is on the
VPCAttahcment, specifically the nativeVLAN field.
The default value of this field is false, meaning the fabric will emit tagged
traffic. If a port emits tagged traffic, it means that the Linux host must handle
the tag. This is accomplished by creating a link layer interface on the host for the tag:
enp2s1.1001 is by convention; it can be customized. The vlan in
this example is 1001 and was created when the subnet was created inside the
VPC.
When the nativeVLAN field is true, the switch will remove the 802.1q tag
from the packet as it exits the switch. When a Linux server is attached to this
port, there is no need to create an additional link layer device.
Multiple VLANs on a port
At times, it is desirable to configure a single port that emits more than one VLAN tag, which is called VLAN trunking. To create a trunk port, attach the VPC subnet to the desired port, multiple VPC subnets can be connected to a single connection object:
kubectl fabric vpc attach --vpc-subnet vpc-1/default --connection server-1--leaf-1
kubectl fabric vpc attach --vpc-subnet vpc-2/default --connection server-1--leaf-1
HostBGP container
If using HostBGP subnets, BGP should be running on the host server and an appropriate configuration should be applied. To facilitate these steps, Hedgehog provides a docker container which automatically starts FRR with a valid configuration to join the Fabric.
As a first step, users should download the docker image from our registry:
The container should then be run with host networking (so that FRR can communicate with the leaves using the host's interfaces) and in privileged mode. Additionally, a few input parameters are required:
- an optional ASN to use in BGP. If present, it should be the first parameter; if not specified, the container will use ASN 64999
- one or more VPC subnets with their related parameters, in the format
<VPC-SUBNET-NAME>:v=<VLAN>:i=<INTERFACE1>[:i=<INTERFACE2>...]:a=<ADDRESS1>[:a=<ADDRESS2>...], where:<VPC-SUBNET-NAME>is just a mnemonic ID for the VPC subnet we want to attach to. It can be anything as long as it is a legal name for a route-map or prefix-list in FRR.v=<VLAN>is the VLAN ID to be used for the VPC; use 0 for untagged.i=<INTERFACE1>is an interface to be used to establish a BGP unnumbered session with a Fabric leaf; if a VLAN ID was specified, a corresponding VLAN interface will be created using the provided interface as the master device.a=<ADDRESS1>is the Virtual IP (or VIP) to be advertised to the leaves; it should have a prefix length of /32 and be part of the subnet the host is attaching to.
As an example, the command might look something like this:
docker run --network=host --privileged --rm --detach --name hostbgp ghcr.io/githedgehog/host-bgp 64307 vpc-01:v=1001:i=enp2s1:i=enp2s2:a=10.100.34.5/32
Note
With the above command, any output produced by the container will not be visible from the terminal
where it was started. Verify that the container is running correctly with docker ps, or examine
the logs of the container with docker logs hostbgp to investigate a failure.
With the above command:
- VLAN interfaces
enp2s1.1001andenp2s2.1001would be created, if not already existing - BGP unnumbered sessions would be created on those same interfaces, using ASN 64307
- the address
10.100.34.5/32would be configured on the loopback of the host server and it would be advertised to the leaves
To further modify the configuration or to troubleshoot the state of the system, an expert user can invoke the FRR CLI using the following command:
For example, one could use vtysh to verify the configuration generated with the above command:
$ docker exec -t hostbgp vtysh -c "show run"
Building configuration...
Current configuration:
!
frr version 10.5.1_git
frr defaults traditional
hostname server-04
service integrated-vtysh-config
!
ip prefix-list vpc-01 seq 5 permit 10.100.34.5/32
!
route-map vpc-01 permit 10
match ip address prefix-list vpc-01
exit
!
interface lo
ip address 10.100.34.5/32
exit
!
router bgp 64307
no bgp ebgp-requires-policy
bgp bestpath as-path multipath-relax
timers bgp 3 9
neighbor enp2s1.1001 interface remote-as external
neighbor enp2s2.1001 interface remote-as external
!
address-family ipv4 unicast
network 10.100.34.5/32
neighbor enp2s1.1001 route-map vpc-01 out
neighbor enp2s2.1001 route-map vpc-01 out
maximum-paths 4
exit-address-family
exit
!
end
To stop the container, just run the following command:
Note that stopping the docker container does not currently remove the VIPs from the loopback, nor does it delete the VLAN interfaces. If needed, these should be removed manually; for example, using iproute2 and the reference command above, one could run:
sudo ip address delete dev lo 10.100.34.5/32
sudo ip link delete dev enp2s1.1001
sudo ip link delete dev enp2s2.1001
Users should consider automating the startup of the hostbgp container at system boot up, to make sure that connectivity is restored in case of a reboot.
Example: multi-VPC multi-homed server
Let's assume that server-03 is attached to both leaf-01 and leaf-02 with unbundled connections
server-03--unbundled--leaf-01 and server-03--unbundled--leaf-02, and that we want it to be part
of two separate VPCs using host-BGP. We can create the VPCs and attachments e.g. from the control node
using the Fabric kubectl plugin:
core@control-1 ~ $ kubectl fabric vpc create --name=vpc-01 --subnet=10.0.1.0/24 --vlan=1001 --host-bgp=true
10:04:09 INF VPC created name=vpc-01
core@control-1 ~ $ kubectl fabric vpc create --name=vpc-02 --subnet=10.0.2.0/24 --vlan=1002 --host-bgp=true
10:04:24 INF VPC created name=vpc-02
core@control-1 ~ $ kubectl fabric vpc attach --name=s3-v1-l1 --conn=server-03--unbundled--leaf-01 --subnet=vpc-01/default
10:05:59 INF VPCAttachment created name=s3-v1-l1
core@control-1 ~ $ kubectl fabric vpc attach --name=s3-v1-l2 --conn=server-03--unbundled--leaf-02 --subnet=vpc-01/default
10:06:08 INF VPCAttachment created name=s3-v1-l2
core@control-1 ~ $ kubectl fabric vpc attach --name=s3-v2-l1 --conn=server-03--unbundled--leaf-01 --subnet=vpc-02/default
10:06:24 INF VPCAttachment created name=s3-v2-l1
core@control-1 ~ $ kubectl fabric vpc attach --name=s3-v2-l2 --conn=server-03--unbundled--leaf-02 --subnet=vpc-02/default
10:06:33 INF VPCAttachment created name=s3-v2-l2
Then we can configure server-03 using the provided container:
docker run --network=host --privileged --rm --detach --name hostbgp ghcr.io/githedgehog/host-bgp vpc-01:v=1001:i=enp2s1:i=enp2s2:a=10.0.1.3/32 vpc-02:v=1002:i=enp2s1:i=enp2s2:a=10.0.2.3/32
This will generate the following FRR configuration:
!
ip prefix-list vpc-01 seq 5 permit 10.0.1.3/32
ip prefix-list vpc-02 seq 5 permit 10.0.2.3/32
!
route-map vpc-01 permit 10
match ip address prefix-list vpc-01
exit
!
route-map vpc-02 permit 10
match ip address prefix-list vpc-02
exit
!
interface lo
ip address 10.0.1.3/32
ip address 10.0.2.3/32
exit
!
router bgp 64999
no bgp ebgp-requires-policy
bgp bestpath as-path multipath-relax
timers bgp 3 9
neighbor enp2s1.1001 interface remote-as external
neighbor enp2s1.1002 interface remote-as external
neighbor enp2s2.1001 interface remote-as external
neighbor enp2s2.1002 interface remote-as external
!
address-family ipv4 unicast
network 10.0.1.3/32
network 10.0.2.3/32
neighbor enp2s1.1001 route-map vpc-01 out
neighbor enp2s1.1002 route-map vpc-02 out
neighbor enp2s2.1001 route-map vpc-01 out
neighbor enp2s2.1002 route-map vpc-02 out
maximum-paths 4
exit-address-family
exit
!
And we can verify on either of the leaves attached to server-03 that VIPs are only
learned in the VPC they belong to:
leaf-01# show ip route vrf VrfVvpc-01
Codes: K - kernel route, C - connected, S - static, B - BGP, O - OSPF, A - attached-host
> - selected route, * - FIB route, q - queued route, r - rejected route, b - backup
Destination Gateway Dist/Metric Last Update
--------------------------------------------------------------------------------------------------------------------------------
B>* 10.0.1.3/32 via fe80::e20:12ff:fefe:401 Ethernet1.1001 20/0 00:09:43 ago
leaf-01# show ip route vrf VrfVvpc-02
Codes: K - kernel route, C - connected, S - static, B - BGP, O - OSPF, A - attached-host
> - selected route, * - FIB route, q - queued route, r - rejected route, b - backup
Destination Gateway Dist/Metric Last Update
--------------------------------------------------------------------------------------------------------------------------------
B>* 10.0.2.3/32 via fe80::e20:12ff:fefe:401 Ethernet1.1002 20/0 00:09:47 ago
leaf-01#