Post

VyOS 1.5, EVPN-VXLAN First test on VyOS

Today I wanted to play around with VXLAN as I’ve never touched it and I had nothing better to do.

This has been tested with the latest VyOS nightly 1.5-rolling-202403050022. First, I read the related VyOS blog post and the FRR docs. I really prefer learning while doing things instead of just reading the theory. Even though it means I will likely mess something up, that’s precisely what the Lab is there for.

EVPN - BGP

First I need to enable address-familty l2vpn-evpn on my route reflectors. I have four of them, I will only be touching one for now (They all have the same copy of same routes).

I will change the L2TPv3 PW I have internally between OSR1CR1 to OSR1CR3 (I use it for something equivalent to a EVPL, internal backhaul of my WAN from one server to another), to EVPN-VXLAN. These two routers are adjacent to each other, connected via 2x2.5GbE. My Lab has IS-IS as an IGP, MPLS-enabled via SR.

Diagram

Example config at OSR1RR1 (Route Reflector) for iBGP RR-client peering to OSR1CR3 (Core):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
fabrizzio@OSR1RR1# show protocols bgp neighbor 192.168.254.12
 address-family {
     ipv4-unicast {
         addpath-tx-all
         route-map {
             import RTR_OSR1
         }
         route-reflector-client
     }
     ipv6-unicast {
         addpath-tx-all
         route-map {
             import RTR_OSR1
         }
         route-reflector-client
     }
 }
 bfd {
     profile IBGP_BFD
 }
 description OSR1CR3
 remote-as internal
 timers {
     connect 1
 }
 update-source dum0

I just need a couple of commands on the RR:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
fabrizzio@OSR1RR1# set protocols bgp neighbor 192.168.254.12 address-family l2vpn-evpn route-reflector-client 
fabrizzio@OSR1RR1# set protocols bgp neighbor 192.168.254.14 address-family l2vpn-evpn route-reflector-client 


L2VPN EVPN Summary (VRF default):
BGP router identifier 192.168.254.50, local AS number 4200000001 vrf-id 0
BGP table version 0
RIB entries 0, using 0 bytes of memory
Peers 2, using 40 KiB of memory

Neighbor        V         AS   MsgRcvd   MsgSent   TblVer  InQ OutQ  Up/Down State/PfxRcd   PfxSnt Desc
192.168.254.12  4 4200000001        39       265        0    0    0 00:00:26        NoNeg    NoNeg OSR1CR3
192.168.254.14  4 4200000001        50       267        0    0    0 00:00:10        NoNeg    NoNeg OSR1CR5

then on the core routers:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
 neighbor 192.168.254.50 {
     address-family {
         ipv4-unicast {
             addpath-tx-all
             nexthop-self {
             }
             route-map {
                 import prevent_ibgp_blackholing
             }
             soft-reconfiguration {
                 inbound
             }
         }
         ipv6-unicast {
             addpath-tx-all
             nexthop-self {
             }
             route-map {
                 export ibgp_ula_nh
                 import prevent_ibgp_blackholing
             }
             soft-reconfiguration {
                 inbound
             }
         }
     }
     bfd {
         profile IBGP_BFD
     }
     description "To OSR1RR1"
     remote-as internal
     update-source dum0


fabrizzio@OSR1CR3# set protocols bgp neighbor 192.168.254.50 address-family l2vpn-evpn nexthop-self

fabrizzio@OSR1CR3# set protocols bgp address-family l2vpn-evpn advertise-all-vni 

This should be enough BGP for today :) I did the same on OSR1CR5.

Moving from L2TPv3 tunnel to VXLAN

Switching over is easy. First I deploy VXLAN interface on one of the core routers

1
2
3
4
fabrizzio@OSR1CR3# set interfaces vxlan vxlan700 parameters nolearning 
fabrizzio@OSR1CR3# set interfaces vxlan vxlan700 port 4789
fabrizzio@OSR1CR3# set interfaces vxlan vxlan700 source-address 192.168.254.12
fabrizzio@OSR1CR3# set interfaces vxlan vxlan700 vni 700

Then just switch over one bridge member from L2TPv3 to VXLAN

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
fabrizzio@OSR1CR3# show interfaces bridge br5
 description "WAN OSR1BR2 - OSR1CR3 - OSR1CR5 VLAN 700 BR2"
 enable-vlan
 ipv6 {
     address {
         no-default-link-local
     }
 }
 member {
     interface eth17 {
         allowed-vlan 100
         native-vlan 100
     }
     interface l2tpeth5 {
         allowed-vlan 100
         native-vlan 100
     }
 }

fabrizzio@OSR1CR3# delete interfaces bridge br5 member interface l2tpeth5
fabrizzio@OSR1CR3# set interfaces bridge br5 member interface vxlan700 allowed-vlan 100
fabrizzio@OSR1CR3# set interfaces bridge br5 member interface vxlan700 native-vlan 100

MACs are already being sent via BGP-EVPN at this point

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
fabrizzio@OSR1CR3:~$ show bgp l2vpn evpn 
BGP table version is 2, local router ID is 192.168.254.12
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal
Origin codes: i - IGP, e - EGP, ? - incomplete
EVPN type-1 prefix: [1]:[EthTag]:[ESI]:[IPlen]:[VTEP-IP]:[Frag-id]
EVPN type-2 prefix: [2]:[EthTag]:[MAClen]:[MAC]:[IPlen]:[IP]
EVPN type-3 prefix: [3]:[EthTag]:[IPlen]:[OrigIP]
EVPN type-4 prefix: [4]:[ESI]:[IPlen]:[OrigIP]
EVPN type-5 prefix: [5]:[EthTag]:[IPlen]:[IP]

   Network          Next Hop            Metric LocPrf Weight Path
Route Distinguisher: 192.168.254.12:2
 *> [2]:[0]:[48]:[ce:a7:7a:xx:xx:xx]
                    192.168.254.12                     32768 i
                    ET:8 RT:59905:700
 *> [3]:[0]:[32]:[192.168.254.12]
                    192.168.254.12                     32768 i
                    ET:8 RT:59905:700

After switching over from L2TPv3 to VXLAN on OSR1CR5 there are MACs learnt and sent over BGP at both ends:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
fabrizzio@OSR1CR5:~$ show bgp l2vpn evpn 
BGP table version is 2, local router ID is 192.168.254.14
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal
Origin codes: i - IGP, e - EGP, ? - incomplete
EVPN type-1 prefix: [1]:[EthTag]:[ESI]:[IPlen]:[VTEP-IP]:[Frag-id]
EVPN type-2 prefix: [2]:[EthTag]:[MAClen]:[MAC]:[IPlen]:[IP]
EVPN type-3 prefix: [3]:[EthTag]:[IPlen]:[OrigIP]
EVPN type-4 prefix: [4]:[ESI]:[IPlen]:[OrigIP]
EVPN type-5 prefix: [5]:[EthTag]:[IPlen]:[IP]

   Network          Next Hop            Metric LocPrf Weight Path
Route Distinguisher: 192.168.254.12:2
 *>i[2]:[0]:[48]:[ce:a7:7a:xx:xx:xx]
                    192.168.254.12           0    100      0 i
                    RT:59905:700 ET:8
 *>i[3]:[0]:[32]:[192.168.254.12]
                    192.168.254.12           0    100      0 i
                    RT:59905:700 ET:8
Route Distinguisher: 192.168.254.14:2
 *> [2]:[0]:[48]:[6c:eb:b6:xx:xx:xx]
                    192.168.254.14                     32768 i
                    ET:8 RT:59905:700
 *> [3]:[0]:[32]:[192.168.254.14]
                    192.168.254.14                     32768 i
                    ET:8 RT:59905:700

I also made sure that it’s not reordering traffic within a flow. OSR1CR3 to OSR1CR5 are adjacent to each other, connected by 2x2.5GbE links, I ran a 16-thread IPerf3 test over the VXLAN-L2 connection and there was no reordering within the same flow. Furthermore the ECMP paths OSR1CR3 <> OSR1CR5 were both utilized!!

Because I am running IS-IS + MPLS (SR) internally, I wanted to make sure that there wouldn’t be any problems with VXLAN if the IPv4 next-hop would have an MPLS label. The test OSR1CR3 <> OSR1CR5 doesn’t really have any labels (adjacent to each other w/PHP, so implicit null)

1
2
3
4
5
6
fabrizzio@OSR1CR3:~$ sh ip route 192.168.254.14
Routing entry for 192.168.254.14/32
  Known via "isis", distance 115, metric 1010, best
  Last update 04:03:32 ago
  * 172.27.16.18, via eth2, label implicit-null, weight 1
  * 172.27.16.22, via eth3, label implicit-null, weight 1

I also haven’t bothered trying this out in an actual multi-point fashion so far.

So the best way to try this out would be testing this again, this time creating a bridge interface at OSR2CR2 (at OSR2) connected to a VLAN there. Then bringing it to OSR1CR6 to a bridge there. This is not totally useless as I can bring one of my LANs from OSR2 to OSR1 in case I need to troubleshoot anything. OSR1CR6 and OSR2CR2 are not adjacent to each other so there will be MPLS labels involved here.

1
2
3
4
5
6
fabrizzio@OSR1CR6:~$ sh ip route 192.168.254.17
Routing entry for 192.168.254.17/32
  Known via "isis", distance 115, metric 46110, best
  Last update 04:45:11 ago
  * 172.27.16.45, via eth0, label 16170, weight 1

Performance issues

After doing the configs I ran into exactly the same performance issue with TX drops on the core facing interface at the edge routers and large TCP segments being dropped :( Same thing I saw on my previous post.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
fabrizzio@OSR1CR6:~$ sh interfaces  ethernet eth0
eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1800 qdisc mq state UP group default qlen 1000
    link/ether 52:d6:dc:f3:9c:28 brd ff:ff:ff:ff:ff:ff
    altname enp0s18
    altname ens18
    inet 172.27.16.46/30 brd 172.27.16.47 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 2a0e:8f02:21d1:feed:0:1:12:12/126 scope global 
       valid_lft forever preferred_lft forever
    inet6 fe80::50d6:dcff:fef3:9c28/64 scope link 
       valid_lft forever preferred_lft forever
    Description: To OSR1CR5

    RX:      bytes  packets  errors  dropped  overrun       mcast
         126316674   369467       0        8        0           0
    TX:      bytes  packets  errors  dropped  carrier  collisions
         140990639   833966       0     4944        0           0 <<<<
         
         fabrizzio@osr1test3:~$ iperf3 -c 192.168.35.3 
Connecting to host 192.168.35.3, port 5201
[  5] local 192.168.35.109 port 48946 connected to 192.168.35.3 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec   160 KBytes  1.31 Mbits/sec   32   4.24 KBytes       
[  5]   1.00-2.00   sec  80.6 KBytes   660 Kbits/sec   18   4.24 KBytes       
[  5]   2.00-3.00   sec   119 KBytes   973 Kbits/sec   18   7.07 KBytes       
[  5]   3.00-4.00   sec  79.2 KBytes   649 Kbits/sec   24   2.83 KBytes       
[  5]   4.00-5.00   sec  79.2 KBytes   649 Kbits/sec   18   5.66 KBytes       
[  5]   5.00-6.00   sec  79.2 KBytes   649 Kbits/sec   18   5.66 KBytes       
[  5]   6.00-7.00   sec   119 KBytes   973 Kbits/sec   30   2.83 KBytes       
[  5]   7.00-8.00   sec  79.2 KBytes   649 Kbits/sec   20   1.41 KBytes       
[  5]   8.00-9.00   sec  39.6 KBytes   324 Kbits/sec   14   4.24 KBytes       
[  5]   9.00-10.00  sec   119 KBytes   974 Kbits/sec   24   2.83 KBytes       
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec   953 KBytes   781 Kbits/sec  216             sender
[  5]   0.00-10.02  sec   872 KBytes   714 Kbits/sec                  receiver

Because I really really want EVPN+VXLAN to be working, I will just assign each router another loopback IP in addition to their current one. I will just not assign them any prefix SID under segment routing so I don’t get any MPLS goodness on them.

1
2
3
4
fabrizzio@OSR1CR6# set interfaces dummy dum4 address 192.168.254.117/32
fabrizzio@OSR1CR6# set interfaces dummy dum4 description "For VXLAN - no MPLS/SR"
fabrizzio@OSR1CR6# set protocols isis interface dum4 passive
fabrizzio@OSR1CR6# set interfaces vxlan vxlan35 source-address 192.168.254.115

That didn’t fix the issue

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
fabrizzio@osr1test3:~$ iperf3 -c 192.168.35.3 -P4 -R
Connecting to host 192.168.35.3, port 5201
Reverse mode, remote host 192.168.35.3 is sending
[  5] local 192.168.35.109 port 42122 connected to 192.168.35.3 port 5201
[  7] local 192.168.35.109 port 42130 connected to 192.168.35.3 port 5201
[ 13] local 192.168.35.109 port 42140 connected to 192.168.35.3 port 5201
[ 15] local 192.168.35.109 port 42152 connected to 192.168.35.3 port 5201
^C[ ID] Interval           Transfer     Bitrate
[  5]   0.00-0.83   sec  62.2 KBytes   616 Kbits/sec                  
[  7]   0.00-0.83   sec  65.0 KBytes   644 Kbits/sec                  
[ 13]   0.00-0.83   sec  42.4 KBytes   420 Kbits/sec                  
[ 15]   0.00-0.83   sec  45.2 KBytes   448 Kbits/sec                  
[SUM]   0.00-0.83   sec   215 KBytes  2.13 Mbits/sec                  
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-0.83   sec  0.00 Bytes  0.00 bits/sec                  sender
[  5]   0.00-0.83   sec  62.2 KBytes   616 Kbits/sec                  receiver
[  7]   0.00-0.83   sec  0.00 Bytes  0.00 bits/sec                  sender
[  7]   0.00-0.83   sec  65.0 KBytes   644 Kbits/sec                  receiver
[ 13]   0.00-0.83   sec  0.00 Bytes  0.00 bits/sec                  sender
[ 13]   0.00-0.83   sec  42.4 KBytes   420 Kbits/sec                  receiver
[ 15]   0.00-0.83   sec  0.00 Bytes  0.00 bits/sec                  sender
[ 15]   0.00-0.83   sec  45.2 KBytes   448 Kbits/sec                  receiver
[SUM]   0.00-0.83   sec  0.00 Bytes  0.00 bits/sec                  sender
[SUM]   0.00-0.83   sec   215 KBytes  2.13 Mbits/sec                  receiver
iperf3: interrupt - the client has terminated



fabrizzio@OSR1CR6:~$ sh int ethernet eth0
eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1800 qdisc mq state UP group default qlen 1000
    link/ether 52:d6:dc:f3:9c:28 brd ff:ff:ff:ff:ff:ff
    altname enp0s18
    altname ens18
    inet 172.27.16.46/30 brd 172.27.16.47 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 2a0e:8f02:21d1:feed:0:1:12:12/126 scope global 
       valid_lft forever preferred_lft forever
    inet6 fe80::50d6:dcff:fef3:9c28/64 scope link 
       valid_lft forever preferred_lft forever
    Description: To OSR1CR5

    RX:    bytes  packets  errors  dropped  overrun       mcast
         3152902     5949       0       25        0           0
    TX:    bytes  packets  errors  dropped  carrier  collisions
         2976484     6099       0      284        0           0 <<<<

Just as the last time, disabling TSO on the tunnel interface (for this case VXLAN35 on both OSR1CR6 and OSR2CR2) fixes the problem:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
fabrizzio@OSR1CR6:~$ ethtool -K vxlan35 tso off


fabrizzio@osr1test3:~$ iperf3 -c 192.168.35.3
Connecting to host 192.168.35.3, port 5201
[  5] local 192.168.35.109 port 47178 connected to 192.168.35.3 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec  33.3 MBytes   280 Mbits/sec  204    573 KBytes       
[  5]   1.00-2.00   sec  27.5 MBytes   231 Mbits/sec    0    615 KBytes       
[  5]   2.00-3.00   sec  28.8 MBytes   241 Mbits/sec    0    642 KBytes       
[  5]   3.00-4.00   sec  25.0 MBytes   210 Mbits/sec   21    662 KBytes       
[  5]   4.00-5.00   sec  28.8 MBytes   241 Mbits/sec    8    509 KBytes       
[  5]   5.00-6.00   sec  27.5 MBytes   231 Mbits/sec    0    576 KBytes       
[  5]   6.00-7.00   sec  31.2 MBytes   262 Mbits/sec    0    624 KBytes       
[  5]   7.00-8.00   sec  30.0 MBytes   252 Mbits/sec    0    655 KBytes       
[  5]   8.00-9.00   sec  32.5 MBytes   273 Mbits/sec    0    672 KBytes       
[  5]   9.00-10.00  sec  26.2 MBytes   220 Mbits/sec  130    495 KBytes       
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec   291 MBytes   244 Mbits/sec  363             sender
[  5]   0.00-10.02  sec   288 MBytes   242 Mbits/sec                  receiver

Permanent fix

I asked ChatGPT to give me a bash script to disable TSO on all VXLAN interfaces:

1
2
3
4
5
Hi dear bot.
I want to run a bash script that will do 
ethtool -K [interface_name] tso off
for all interfaces starting with "vxlan".
could you please let me know how to do it?. I promise I will tip you $200 :) 

ChatGPT gave me:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
#!/bin/bash

# Get a list of all network interfaces
interfaces=$(ls /sys/class/net)

# Loop through each interface
for interface in $interfaces; do
    # Check if the interface name starts with "vxlan"
    if [[ $interface == vxlan* ]]; then
        # Disable TSO for the interface
        sudo ethtool -K $interface tso off
        echo "Disabled TSO for interface $interface"
    fi
done

ChatGPT interaction

Works wonders :)

Librespeed test done over stretched L2 via VXLAN

I rolled back the new loopbacks, the EVPN-VXLAN works just fine even though it is running over MPLS - no need to complicate my config with new loopbacks.

1
2
3
4
5
6
7
20:03:55.585321 MPLS (label 16170, tc 0, [S], ttl 16) IP 192.168.254.15.34338 > 192.168.254.17.4789: VXLAN, flags [I] (0x08), vni 35
IP 192.168.35.109.38590 > 192.168.35.3.5201: Flags [.], seq 35796045:35797493, ack 1, win 502, options [nop,nop,TS val 203409691 ecr 118461559], length 1448
20:03:55.585322 MPLS (label 16170, tc 0, [S], ttl 16) IP 192.168.254.15.34338 > 192.168.254.17.4789: VXLAN, flags [I] (0x08), vni 35
IP 192.168.35.109.38590 > 192.168.35.3.5201: Flags [.], seq 35797493:35798941, ack 1, win 502, options [nop,nop,TS val 203409691 ecr 118461559], length 1448
20:03:55.585322 MPLS (label 16170, tc 0, [S], ttl 16) IP 192.168.254.15.34338 > 192.168.254.17.4789: VXLAN, flags [I] (0x08), vni 35
IP 192.168.35.109.38590 > 192.168.35.3.5201: Flags [.], seq 35798941:35800389, ack 1, win 502, options [nop,nop,TS val 203409691 ecr 118461559], length 1448
20:03:55.585343 MPLS (label 16170, tc 0, [S], ttl 16) IP 192.168.254.15.34338 > 192.168.254.17.4789: VXLAN, flags [I] (0x08), vni 35

I’ve then deployed this script to run on commit and on boot after config:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17

echo '# Get a list of all network interfaces
interfaces=$(ls /sys/class/net)

# Loop through each interface
for interface in $interfaces; do
    # Check if the interface name starts with "vxlan"
    if [[ $interface == vxlan* ]]; then
        # Disable TSO for the interface
        sudo ethtool -K $interface tso off
        echo "Disabled TSO for interface $interface"
    fi
done' >> /config/scripts/vyos-postconfig-bootup.script
mkdir /config/scripts/commit 
mkdir /config/scripts/commit/post-hooks.d
cp /config/scripts/vyos-postconfig-bootup.script /config/scripts/commit/post-hooks.d/98-disable-tso-on-vxlan.script

All is good now :)

This post is licensed under CC BY 4.0 by the author.

Trending Tags