VyOS 1.5, EVPN-VXLAN First test on VyOS
Today I wanted to play around with VXLAN as I’ve never touched it and I had nothing better to do.
This has been tested with the latest VyOS nightly 1.5-rolling-202403050022
. First, I read the related VyOS blog post and the FRR docs. I really prefer learning while doing things instead of just reading the theory. Even though it means I will likely mess something up, that’s precisely what the Lab is there for.
EVPN - BGP
First I need to enable address-familty l2vpn-evpn
on my route reflectors. I have four of them, I will only be touching one for now (They all have the same copy of same routes).
I will change the L2TPv3 PW I have internally between OSR1CR1 to OSR1CR3 (I use it for something equivalent to a EVPL, internal backhaul of my WAN from one server to another), to EVPN-VXLAN. These two routers are adjacent to each other, connected via 2x2.5GbE. My Lab has IS-IS as an IGP, MPLS-enabled via SR.
Example config at OSR1RR1 (Route Reflector) for iBGP RR-client peering to OSR1CR3 (Core):
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
fabrizzio@OSR1RR1# show protocols bgp neighbor 192.168.254.12
address-family {
ipv4-unicast {
addpath-tx-all
route-map {
import RTR_OSR1
}
route-reflector-client
}
ipv6-unicast {
addpath-tx-all
route-map {
import RTR_OSR1
}
route-reflector-client
}
}
bfd {
profile IBGP_BFD
}
description OSR1CR3
remote-as internal
timers {
connect 1
}
update-source dum0
I just need a couple of commands on the RR:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
fabrizzio@OSR1RR1# set protocols bgp neighbor 192.168.254.12 address-family l2vpn-evpn route-reflector-client
fabrizzio@OSR1RR1# set protocols bgp neighbor 192.168.254.14 address-family l2vpn-evpn route-reflector-client
L2VPN EVPN Summary (VRF default):
BGP router identifier 192.168.254.50, local AS number 4200000001 vrf-id 0
BGP table version 0
RIB entries 0, using 0 bytes of memory
Peers 2, using 40 KiB of memory
Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd PfxSnt Desc
192.168.254.12 4 4200000001 39 265 0 0 0 00:00:26 NoNeg NoNeg OSR1CR3
192.168.254.14 4 4200000001 50 267 0 0 0 00:00:10 NoNeg NoNeg OSR1CR5
then on the core routers:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
neighbor 192.168.254.50 {
address-family {
ipv4-unicast {
addpath-tx-all
nexthop-self {
}
route-map {
import prevent_ibgp_blackholing
}
soft-reconfiguration {
inbound
}
}
ipv6-unicast {
addpath-tx-all
nexthop-self {
}
route-map {
export ibgp_ula_nh
import prevent_ibgp_blackholing
}
soft-reconfiguration {
inbound
}
}
}
bfd {
profile IBGP_BFD
}
description "To OSR1RR1"
remote-as internal
update-source dum0
fabrizzio@OSR1CR3# set protocols bgp neighbor 192.168.254.50 address-family l2vpn-evpn nexthop-self
fabrizzio@OSR1CR3# set protocols bgp address-family l2vpn-evpn advertise-all-vni
This should be enough BGP for today :) I did the same on OSR1CR5.
Moving from L2TPv3 tunnel to VXLAN
Switching over is easy. First I deploy VXLAN interface on one of the core routers
1
2
3
4
fabrizzio@OSR1CR3# set interfaces vxlan vxlan700 parameters nolearning
fabrizzio@OSR1CR3# set interfaces vxlan vxlan700 port 4789
fabrizzio@OSR1CR3# set interfaces vxlan vxlan700 source-address 192.168.254.12
fabrizzio@OSR1CR3# set interfaces vxlan vxlan700 vni 700
Then just switch over one bridge member from L2TPv3 to VXLAN
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
fabrizzio@OSR1CR3# show interfaces bridge br5
description "WAN OSR1BR2 - OSR1CR3 - OSR1CR5 VLAN 700 BR2"
enable-vlan
ipv6 {
address {
no-default-link-local
}
}
member {
interface eth17 {
allowed-vlan 100
native-vlan 100
}
interface l2tpeth5 {
allowed-vlan 100
native-vlan 100
}
}
fabrizzio@OSR1CR3# delete interfaces bridge br5 member interface l2tpeth5
fabrizzio@OSR1CR3# set interfaces bridge br5 member interface vxlan700 allowed-vlan 100
fabrizzio@OSR1CR3# set interfaces bridge br5 member interface vxlan700 native-vlan 100
MACs are already being sent via BGP-EVPN at this point
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
fabrizzio@OSR1CR3:~$ show bgp l2vpn evpn
BGP table version is 2, local router ID is 192.168.254.12
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal
Origin codes: i - IGP, e - EGP, ? - incomplete
EVPN type-1 prefix: [1]:[EthTag]:[ESI]:[IPlen]:[VTEP-IP]:[Frag-id]
EVPN type-2 prefix: [2]:[EthTag]:[MAClen]:[MAC]:[IPlen]:[IP]
EVPN type-3 prefix: [3]:[EthTag]:[IPlen]:[OrigIP]
EVPN type-4 prefix: [4]:[ESI]:[IPlen]:[OrigIP]
EVPN type-5 prefix: [5]:[EthTag]:[IPlen]:[IP]
Network Next Hop Metric LocPrf Weight Path
Route Distinguisher: 192.168.254.12:2
*> [2]:[0]:[48]:[ce:a7:7a:xx:xx:xx]
192.168.254.12 32768 i
ET:8 RT:59905:700
*> [3]:[0]:[32]:[192.168.254.12]
192.168.254.12 32768 i
ET:8 RT:59905:700
After switching over from L2TPv3 to VXLAN on OSR1CR5 there are MACs learnt and sent over BGP at both ends:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
fabrizzio@OSR1CR5:~$ show bgp l2vpn evpn
BGP table version is 2, local router ID is 192.168.254.14
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal
Origin codes: i - IGP, e - EGP, ? - incomplete
EVPN type-1 prefix: [1]:[EthTag]:[ESI]:[IPlen]:[VTEP-IP]:[Frag-id]
EVPN type-2 prefix: [2]:[EthTag]:[MAClen]:[MAC]:[IPlen]:[IP]
EVPN type-3 prefix: [3]:[EthTag]:[IPlen]:[OrigIP]
EVPN type-4 prefix: [4]:[ESI]:[IPlen]:[OrigIP]
EVPN type-5 prefix: [5]:[EthTag]:[IPlen]:[IP]
Network Next Hop Metric LocPrf Weight Path
Route Distinguisher: 192.168.254.12:2
*>i[2]:[0]:[48]:[ce:a7:7a:xx:xx:xx]
192.168.254.12 0 100 0 i
RT:59905:700 ET:8
*>i[3]:[0]:[32]:[192.168.254.12]
192.168.254.12 0 100 0 i
RT:59905:700 ET:8
Route Distinguisher: 192.168.254.14:2
*> [2]:[0]:[48]:[6c:eb:b6:xx:xx:xx]
192.168.254.14 32768 i
ET:8 RT:59905:700
*> [3]:[0]:[32]:[192.168.254.14]
192.168.254.14 32768 i
ET:8 RT:59905:700
I also made sure that it’s not reordering traffic within a flow. OSR1CR3 to OSR1CR5 are adjacent to each other, connected by 2x2.5GbE links, I ran a 16-thread IPerf3 test over the VXLAN-L2 connection and there was no reordering within the same flow. Furthermore the ECMP paths OSR1CR3 <> OSR1CR5 were both utilized!!
Because I am running IS-IS + MPLS (SR) internally, I wanted to make sure that there wouldn’t be any problems with VXLAN if the IPv4 next-hop would have an MPLS label. The test OSR1CR3 <> OSR1CR5 doesn’t really have any labels (adjacent to each other w/PHP, so implicit null)
1
2
3
4
5
6
fabrizzio@OSR1CR3:~$ sh ip route 192.168.254.14
Routing entry for 192.168.254.14/32
Known via "isis", distance 115, metric 1010, best
Last update 04:03:32 ago
* 172.27.16.18, via eth2, label implicit-null, weight 1
* 172.27.16.22, via eth3, label implicit-null, weight 1
I also haven’t bothered trying this out in an actual multi-point fashion so far.
So the best way to try this out would be testing this again, this time creating a bridge interface at OSR2CR2 (at OSR2) connected to a VLAN there. Then bringing it to OSR1CR6 to a bridge there. This is not totally useless as I can bring one of my LANs from OSR2 to OSR1 in case I need to troubleshoot anything. OSR1CR6 and OSR2CR2 are not adjacent to each other so there will be MPLS labels involved here.
1
2
3
4
5
6
fabrizzio@OSR1CR6:~$ sh ip route 192.168.254.17
Routing entry for 192.168.254.17/32
Known via "isis", distance 115, metric 46110, best
Last update 04:45:11 ago
* 172.27.16.45, via eth0, label 16170, weight 1
Performance issues
After doing the configs I ran into exactly the same performance issue with TX drops on the core facing interface at the edge routers and large TCP segments being dropped :( Same thing I saw on my previous post.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
fabrizzio@OSR1CR6:~$ sh interfaces ethernet eth0
eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1800 qdisc mq state UP group default qlen 1000
link/ether 52:d6:dc:f3:9c:28 brd ff:ff:ff:ff:ff:ff
altname enp0s18
altname ens18
inet 172.27.16.46/30 brd 172.27.16.47 scope global eth0
valid_lft forever preferred_lft forever
inet6 2a0e:8f02:21d1:feed:0:1:12:12/126 scope global
valid_lft forever preferred_lft forever
inet6 fe80::50d6:dcff:fef3:9c28/64 scope link
valid_lft forever preferred_lft forever
Description: To OSR1CR5
RX: bytes packets errors dropped overrun mcast
126316674 369467 0 8 0 0
TX: bytes packets errors dropped carrier collisions
140990639 833966 0 4944 0 0 <<<<
fabrizzio@osr1test3:~$ iperf3 -c 192.168.35.3
Connecting to host 192.168.35.3, port 5201
[ 5] local 192.168.35.109 port 48946 connected to 192.168.35.3 port 5201
[ ID] Interval Transfer Bitrate Retr Cwnd
[ 5] 0.00-1.00 sec 160 KBytes 1.31 Mbits/sec 32 4.24 KBytes
[ 5] 1.00-2.00 sec 80.6 KBytes 660 Kbits/sec 18 4.24 KBytes
[ 5] 2.00-3.00 sec 119 KBytes 973 Kbits/sec 18 7.07 KBytes
[ 5] 3.00-4.00 sec 79.2 KBytes 649 Kbits/sec 24 2.83 KBytes
[ 5] 4.00-5.00 sec 79.2 KBytes 649 Kbits/sec 18 5.66 KBytes
[ 5] 5.00-6.00 sec 79.2 KBytes 649 Kbits/sec 18 5.66 KBytes
[ 5] 6.00-7.00 sec 119 KBytes 973 Kbits/sec 30 2.83 KBytes
[ 5] 7.00-8.00 sec 79.2 KBytes 649 Kbits/sec 20 1.41 KBytes
[ 5] 8.00-9.00 sec 39.6 KBytes 324 Kbits/sec 14 4.24 KBytes
[ 5] 9.00-10.00 sec 119 KBytes 974 Kbits/sec 24 2.83 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-10.00 sec 953 KBytes 781 Kbits/sec 216 sender
[ 5] 0.00-10.02 sec 872 KBytes 714 Kbits/sec receiver
Because I really really want EVPN+VXLAN to be working, I will just assign each router another loopback IP in addition to their current one. I will just not assign them any prefix SID under segment routing so I don’t get any MPLS goodness on them.
1
2
3
4
fabrizzio@OSR1CR6# set interfaces dummy dum4 address 192.168.254.117/32
fabrizzio@OSR1CR6# set interfaces dummy dum4 description "For VXLAN - no MPLS/SR"
fabrizzio@OSR1CR6# set protocols isis interface dum4 passive
fabrizzio@OSR1CR6# set interfaces vxlan vxlan35 source-address 192.168.254.115
That didn’t fix the issue
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
fabrizzio@osr1test3:~$ iperf3 -c 192.168.35.3 -P4 -R
Connecting to host 192.168.35.3, port 5201
Reverse mode, remote host 192.168.35.3 is sending
[ 5] local 192.168.35.109 port 42122 connected to 192.168.35.3 port 5201
[ 7] local 192.168.35.109 port 42130 connected to 192.168.35.3 port 5201
[ 13] local 192.168.35.109 port 42140 connected to 192.168.35.3 port 5201
[ 15] local 192.168.35.109 port 42152 connected to 192.168.35.3 port 5201
^C[ ID] Interval Transfer Bitrate
[ 5] 0.00-0.83 sec 62.2 KBytes 616 Kbits/sec
[ 7] 0.00-0.83 sec 65.0 KBytes 644 Kbits/sec
[ 13] 0.00-0.83 sec 42.4 KBytes 420 Kbits/sec
[ 15] 0.00-0.83 sec 45.2 KBytes 448 Kbits/sec
[SUM] 0.00-0.83 sec 215 KBytes 2.13 Mbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate
[ 5] 0.00-0.83 sec 0.00 Bytes 0.00 bits/sec sender
[ 5] 0.00-0.83 sec 62.2 KBytes 616 Kbits/sec receiver
[ 7] 0.00-0.83 sec 0.00 Bytes 0.00 bits/sec sender
[ 7] 0.00-0.83 sec 65.0 KBytes 644 Kbits/sec receiver
[ 13] 0.00-0.83 sec 0.00 Bytes 0.00 bits/sec sender
[ 13] 0.00-0.83 sec 42.4 KBytes 420 Kbits/sec receiver
[ 15] 0.00-0.83 sec 0.00 Bytes 0.00 bits/sec sender
[ 15] 0.00-0.83 sec 45.2 KBytes 448 Kbits/sec receiver
[SUM] 0.00-0.83 sec 0.00 Bytes 0.00 bits/sec sender
[SUM] 0.00-0.83 sec 215 KBytes 2.13 Mbits/sec receiver
iperf3: interrupt - the client has terminated
fabrizzio@OSR1CR6:~$ sh int ethernet eth0
eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1800 qdisc mq state UP group default qlen 1000
link/ether 52:d6:dc:f3:9c:28 brd ff:ff:ff:ff:ff:ff
altname enp0s18
altname ens18
inet 172.27.16.46/30 brd 172.27.16.47 scope global eth0
valid_lft forever preferred_lft forever
inet6 2a0e:8f02:21d1:feed:0:1:12:12/126 scope global
valid_lft forever preferred_lft forever
inet6 fe80::50d6:dcff:fef3:9c28/64 scope link
valid_lft forever preferred_lft forever
Description: To OSR1CR5
RX: bytes packets errors dropped overrun mcast
3152902 5949 0 25 0 0
TX: bytes packets errors dropped carrier collisions
2976484 6099 0 284 0 0 <<<<
Just as the last time, disabling TSO on the tunnel interface (for this case VXLAN35 on both OSR1CR6 and OSR2CR2) fixes the problem:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
fabrizzio@OSR1CR6:~$ ethtool -K vxlan35 tso off
fabrizzio@osr1test3:~$ iperf3 -c 192.168.35.3
Connecting to host 192.168.35.3, port 5201
[ 5] local 192.168.35.109 port 47178 connected to 192.168.35.3 port 5201
[ ID] Interval Transfer Bitrate Retr Cwnd
[ 5] 0.00-1.00 sec 33.3 MBytes 280 Mbits/sec 204 573 KBytes
[ 5] 1.00-2.00 sec 27.5 MBytes 231 Mbits/sec 0 615 KBytes
[ 5] 2.00-3.00 sec 28.8 MBytes 241 Mbits/sec 0 642 KBytes
[ 5] 3.00-4.00 sec 25.0 MBytes 210 Mbits/sec 21 662 KBytes
[ 5] 4.00-5.00 sec 28.8 MBytes 241 Mbits/sec 8 509 KBytes
[ 5] 5.00-6.00 sec 27.5 MBytes 231 Mbits/sec 0 576 KBytes
[ 5] 6.00-7.00 sec 31.2 MBytes 262 Mbits/sec 0 624 KBytes
[ 5] 7.00-8.00 sec 30.0 MBytes 252 Mbits/sec 0 655 KBytes
[ 5] 8.00-9.00 sec 32.5 MBytes 273 Mbits/sec 0 672 KBytes
[ 5] 9.00-10.00 sec 26.2 MBytes 220 Mbits/sec 130 495 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-10.00 sec 291 MBytes 244 Mbits/sec 363 sender
[ 5] 0.00-10.02 sec 288 MBytes 242 Mbits/sec receiver
Permanent fix
I asked ChatGPT to give me a bash script to disable TSO on all VXLAN interfaces:
1
2
3
4
5
Hi dear bot.
I want to run a bash script that will do
ethtool -K [interface_name] tso off
for all interfaces starting with "vxlan".
could you please let me know how to do it?. I promise I will tip you $200 :)
ChatGPT gave me:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
#!/bin/bash
# Get a list of all network interfaces
interfaces=$(ls /sys/class/net)
# Loop through each interface
for interface in $interfaces; do
# Check if the interface name starts with "vxlan"
if [[ $interface == vxlan* ]]; then
# Disable TSO for the interface
sudo ethtool -K $interface tso off
echo "Disabled TSO for interface $interface"
fi
done
Works wonders :)
I rolled back the new loopbacks, the EVPN-VXLAN works just fine even though it is running over MPLS - no need to complicate my config with new loopbacks.
1
2
3
4
5
6
7
20:03:55.585321 MPLS (label 16170, tc 0, [S], ttl 16) IP 192.168.254.15.34338 > 192.168.254.17.4789: VXLAN, flags [I] (0x08), vni 35
IP 192.168.35.109.38590 > 192.168.35.3.5201: Flags [.], seq 35796045:35797493, ack 1, win 502, options [nop,nop,TS val 203409691 ecr 118461559], length 1448
20:03:55.585322 MPLS (label 16170, tc 0, [S], ttl 16) IP 192.168.254.15.34338 > 192.168.254.17.4789: VXLAN, flags [I] (0x08), vni 35
IP 192.168.35.109.38590 > 192.168.35.3.5201: Flags [.], seq 35797493:35798941, ack 1, win 502, options [nop,nop,TS val 203409691 ecr 118461559], length 1448
20:03:55.585322 MPLS (label 16170, tc 0, [S], ttl 16) IP 192.168.254.15.34338 > 192.168.254.17.4789: VXLAN, flags [I] (0x08), vni 35
IP 192.168.35.109.38590 > 192.168.35.3.5201: Flags [.], seq 35798941:35800389, ack 1, win 502, options [nop,nop,TS val 203409691 ecr 118461559], length 1448
20:03:55.585343 MPLS (label 16170, tc 0, [S], ttl 16) IP 192.168.254.15.34338 > 192.168.254.17.4789: VXLAN, flags [I] (0x08), vni 35
I’ve then deployed this script to run on commit and on boot after config:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
echo '# Get a list of all network interfaces
interfaces=$(ls /sys/class/net)
# Loop through each interface
for interface in $interfaces; do
# Check if the interface name starts with "vxlan"
if [[ $interface == vxlan* ]]; then
# Disable TSO for the interface
sudo ethtool -K $interface tso off
echo "Disabled TSO for interface $interface"
fi
done' >> /config/scripts/vyos-postconfig-bootup.script
mkdir /config/scripts/commit
mkdir /config/scripts/commit/post-hooks.d
cp /config/scripts/vyos-postconfig-bootup.script /config/scripts/commit/post-hooks.d/98-disable-tso-on-vxlan.script
All is good now :)