IPv6 next-hop problem when recursively resolving BGP route
Why can’t I SSH to my routers anymore???????
Well, everything started with an upgrade to one of the latest VyOS Nightly builds “vyos-1.4-rolling-202306080317” on some of my AS203528 routers.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
fabrizzio@osr1j1:~$ ping osr2br2
PING osr2br2(dum0.OSR2BR2.compumundohipermegared.one (2a0e:8f02:21d0:ffff::15)) 56 data bytes
^C
--- osr2br2 ping statistics ---
7 packets transmitted, 0 received, 100% packet loss, time 6130ms
fabrizzio@osr1j1:~$ traceroute osr2br2
traceroute to osr2br2 (2a0e:8f02:21d0:ffff::15), 30 hops max, 80 byte packets
1 _gateway (2a0e:8f02:21d1:120::1) 0.247 ms 0.222 ms 0.212 ms
2 eth9.osr1cr5.compumundohipermegared.one (2a0e:8f02:21d1:feed:0:1:19:11) 0.423 ms 0.408 ms 0.383 ms
3 eth2.osr1cr3.compumundohipermegared.one (2a0e:8f02:21d1:feed:0:1:5:11) 0.774 ms 0.751 ms 0.678 ms
4 osr1fw2.compumundohipermegared.one (2a0e:8f02:21d1:ffff::42) 0.860 ms 0.836 ms 0.810 ms
5 dum0.OSR1BR2.compumundohipermegared.one (2a0e:8f02:21d0:ffff::13) 1.107 ms 1.079 ms 1.054 ms
6 * * *
7 * * *
8 * * *
9 * * *
10 * * *
11 *^C
Just by the trace it looked very much like a problem on the reverse path. Luckily I was still able to connect via IPv4. Trying to ping my OSR1 jumphost from OSR2BR2:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
fabrizzio@OSR2BR2:~$ ping 2a0e:8f02:21d1:120::11
PING 2a0e:8f02:21d1:120::11(2a0e:8f02:21d1:120::11) 56 data bytes
From 2a0e:8f02:21d0:feed:deed:0:21c:1002 icmp_seq=1 Destination unreachable: Address unreachable
From 2a0e:8f02:21d0:feed:deed:0:21c:1002 icmp_seq=2 Destination unreachable: Address unreachable
From 2a0e:8f02:21d0:feed:deed:0:21c:1002 icmp_seq=3 Destination unreachable: Address unreachable
From 2a0e:8f02:21d0:feed:deed:0:21c:1002 icmp_seq=4 Destination unreachable: Address unreachable
^C
--- 2a0e:8f02:21d1:120::11 ping statistics ---
5 packets transmitted, 0 received, +4 errors, 100% packet loss, time 4068ms
fabrizzio@OSR2BR2:~$ sh ipv6 route 2a0e:8f02:21d1:120::11
Routing entry for 2a0e:8f02:21d1::/48
Known via "bgp", distance 200, metric 1000, best
Last update 01:11:38 ago
fc0e:8f02:21d0:ffff::12 (recursive), weight 1
* fe80::7c2a:81ff:fe87:f5a2, via br536, weight 1
fc0e:8f02:21d0:ffff::13 (recursive), weight 1
* fe80::401d:82ff:fe26:9549, via br540, weight 1
Destination unreachable?
This was very odd for me. I could ping the other end of the tunnel (OSR2BR2 <> OSR1BR1 & OSR2BR2 <> OSR1BR2). Also, the full mesh of IS-IS adjacencies are up at OSR2BR2.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
fabrizzio@OSR2BR2:~$ sh int bridge br536
br536: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1600 qdisc noqueue state UP group default qlen 1000
link/ether c6:21:99:5a:13:d2 brd ff:ff:ff:ff:ff:ff
inet6 2a0e:8f02:21d0:feed:deed:0:218:1002/126 scope global
valid_lft forever preferred_lft forever
inet6 fe80::ac30:5fff:fe1f:b9eb/64 scope link
valid_lft forever preferred_lft forever
Description: IPv6 Tunnel to OSR1BR1
RX: bytes packets errors dropped overrun mcast
4596611 3846 0 0 0 3825
TX: bytes packets errors dropped carrier collisions
4774008 4002 0 0 0 0
fabrizzio@OSR2BR2:~$ sh int bridge br540
br540: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1600 qdisc noqueue state UP group default qlen 1000
link/ether 8e:50:3e:54:f7:01 brd ff:ff:ff:ff:ff:ff
inet6 2a0e:8f02:21d0:feed:deed:0:21c:1002/126 scope global
valid_lft forever preferred_lft forever
inet6 fe80::f01e:3bff:fe29:3d8/64 scope link
valid_lft forever preferred_lft forever
Description: IPv6 Tunnel to OSR1BR2
RX: bytes packets errors dropped overrun mcast
4641220 3993 0 0 0 3884
TX: bytes packets errors dropped carrier collisions
4777203 3967 0 0 0 0
fabrizzio@OSR2BR2:~$ ping 2a0e:8f02:21d0:feed:deed:0:21c:1001
PING 2a0e:8f02:21d0:feed:deed:0:21c:1001(2a0e:8f02:21d0:feed:deed:0:21c:1001) 56 data bytes
64 bytes from 2a0e:8f02:21d0:feed:deed:0:21c:1001: icmp_seq=1 ttl=64 time=34.0 ms
64 bytes from 2a0e:8f02:21d0:feed:deed:0:21c:1001: icmp_seq=2 ttl=64 time=16.9 ms
^C
--- 2a0e:8f02:21d0:feed:deed:0:21c:1001 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1001ms
rtt min/avg/max/mdev = 16.870/25.443/34.017/8.573 ms
fabrizzio@OSR2BR2:~$ sh isis neighbor
Area VyOS:
System Id Interface L State Holdtime SNPA
AMS1BR1 br533 2 Up 28 2020.2020.2020
NYC1BR1 br534 2 Up 29 2020.2020.2020
OSR1BR1 br536 2 Up 29 2020.2020.2020
OSR1BR2 br540 2 Up 28 2020.2020.2020
OSR2BR1 br548 2 Up 29 2020.2020.2020
OSR1BR3 br596 2 Up 29 2020.2020.2020
OSR2GLASS1 eth8 2 Up 27 2020.2020.2020
Oddly enough I can ping the loopbacks of OSR1BR1 & OSR1BR2 (both GUA and ULA):
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
fabrizzio@OSR2BR2:~$ ping 2a0e:8f02:21d0:ffff::12
PING 2a0e:8f02:21d0:ffff::12(2a0e:8f02:21d0:ffff::12) 56 data bytes
64 bytes from 2a0e:8f02:21d0:ffff::12: icmp_seq=1 ttl=64 time=32.2 ms
^C
--- 2a0e:8f02:21d0:ffff::12 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 32.191/32.191/32.191/0.000 ms
fabrizzio@OSR2BR2:~$ ping 2a0e:8f02:21d0:ffff::13
PING 2a0e:8f02:21d0:ffff::13(2a0e:8f02:21d0:ffff::13) 56 data bytes
64 bytes from 2a0e:8f02:21d0:ffff::13: icmp_seq=1 ttl=64 time=16.9 ms
^C
--- 2a0e:8f02:21d0:ffff::13 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 16.927/16.927/16.927/0.000 ms
fabrizzio@OSR2BR2:~$ ping fc0e:8f02:21d0:ffff::12
PING fc0e:8f02:21d0:ffff::12(fc0e:8f02:21d0:ffff::12) 56 data bytes
64 bytes from fc0e:8f02:21d0:ffff::12: icmp_seq=1 ttl=64 time=16.3 ms
^C
--- fc0e:8f02:21d0:ffff::12 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 16.340/16.340/16.340/0.000 ms
fabrizzio@OSR2BR2:~$ ping fc0e:8f02:21d0:ffff::13
PING fc0e:8f02:21d0:ffff::13(fc0e:8f02:21d0:ffff::13) 56 data bytes
64 bytes from fc0e:8f02:21d0:ffff::13: icmp_seq=1 ttl=64 time=16.2 ms
^C
--- fc0e:8f02:21d0:ffff::13 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 16.228/16.228/16.228/0.000 ms
Then by comparing the routes:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
fabrizzio@OSR2BR2:~$ sh ipv6 route 2a0e:8f02:21d0:ffff::13
Routing entry for 2a0e:8f02:21d0:ffff::13/128
Known via "isis", distance 115, metric 500, best
Last update 01:59:55 ago
* fe80::909f:cdff:fe62:592e, via br540, weight 1 <<<<< CORRECT LL address from OSR1BR2
Routing entry for 2a0e:8f02:21d0:ffff::13/128
Known via "bgp", distance 200, metric 0
Last update 02:00:04 ago
fc0e:8f02:21d0:ffff::13 (recursive), weight 1
fe80::401d:82ff:fe26:9549, via br540, weight 1 <<<<< where did this come from?
fabrizzio@OSR2BR2:~$ sh ipv6 route 2a0e:8f02:21d0:ffff::12
Routing entry for 2a0e:8f02:21d0:ffff::12/128
Known via "bgp", distance 200, metric 0
Last update 01:16:11 ago
fc0e:8f02:21d0:ffff::12 (recursive), weight 1
fe80::7c2a:81ff:fe87:f5a2, via br536, weight 1 <<<< where did this come from?
Routing entry for 2a0e:8f02:21d0:ffff::12/128
Known via "isis", distance 115, metric 500, best
Last update 01:16:11 ago
* fe80::60fa:89ff:fe52:4194, via br536, weight 1 <<<<< CORRECT LL address from OSR1BR1
fabrizzio@OSR2BR2:~$ sh ipv6 route fc0e:8f02:21d0:ffff::12
Routing entry for fc0e:8f02:21d0:ffff::12/128
Known via "isis", distance 115, metric 510, best
Last update 01:16:22 ago
* fe80::60fa:89ff:fe52:4194, via br536, weight 1 <<<<< CORRECT LL address from OSR1BR1
sh fabrizzio@OSR2BR2:~$ sh ipv6 route 2a0e:8f02:21d1:120::11
Routing entry for 2a0e:8f02:21d1::/48
Known via "bgp", distance 200, metric 1000, best
Last update 01:19:09 ago
fc0e:8f02:21d0:ffff::12 (recursive), weight 1
* fe80::7c2a:81ff:fe87:f5a2, via br536, weight 1 <<<< Both next hop LL IPs don't match what's on the other end of the tunnel
fc0e:8f02:21d0:ffff::13 (recursive), weight 1
* fe80::401d:82ff:fe26:9549, via br540, weight 1 <<<< Both next hop LL IPs don't match what's on the other end of the tunnel.
Now here’s the issue. For some reason the IPv6 route to the remote router’s IPv6 ULA loopback (“fc0e:8f02:21d0:ffff::12”, which I am forcing BGP to use) has the correct LL next hop for the other end of the tunnel. But when doing the recursive lookup, as an example towards “2a0e:8f02:21d0:ffff::12/128” or to “2a0e:8f02:21d1::/48”, which gets recursively resolved using “fc0e:8f02:21d0:ffff::12”, the next-hop found for it is incorrect. I tried changing manually the IPv6 Link-local address on OSR1BR1, the IS-IS route next-hop as seen on OSR2BR2 did change, but when doing the recursive look-up it was stuck on the old link-local address.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
fabrizzio@OSR1BR1:~$ sh interfaces bridge br536
br536: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1600 qdisc noqueue state UP group default qlen 1000
link/ether fe:45:5c:03:c6:ae brd ff:ff:ff:ff:ff:ff
inet6 2a0e:8f02:21d0:feed:deed:0:218:1001/126 scope global
valid_lft forever preferred_lft forever
inet6 fe80::60fa:89ff:fe52:4194/64 scope link <<<<< CORRECT LL address from OSR1BR1
valid_lft forever preferred_lft forever
Description: IPv6 Tunnel to OSR2BR2
RX: bytes packets errors dropped overrun mcast
2739350 2303 0 0 0 2289
TX: bytes packets errors dropped carrier collisions
2781627 2287 0 0 0 0
fabrizzio@OSR1BR2:~$ sh interfaces bridge br540
br540: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1600 qdisc noqueue state UP group default qlen 1000
link/ether 4a:11:ee:ae:f3:87 brd ff:ff:ff:ff:ff:ff
inet6 2a0e:8f02:21d0:feed:deed:0:21c:1001/126 scope global
valid_lft forever preferred_lft forever
inet6 fe80::909f:cdff:fe62:592e/64 scope link <<<<< CORRECT LL address from OSR1BR2
valid_lft forever preferred_lft forever
Description: IPv6 Tunnel to OSR2BR2
RX: bytes packets errors dropped overrun mcast
4288601 3568 0 0 0 3552
TX: bytes packets errors dropped carrier collisions"
4356271 3683 0 0 0 0
I truly have no idea what might be going on here. This one is an example. I had the issue occur on other routers and rebooting them was playing a nasty whack-a-mole game with the issue appearing elsewhere. Clearing BGP neighbors didn’t fix this issue as well. The thing is that the MAC addresses assigned to the tunnels change upon a router reboot. Therefore if you reboot router A then all the other tunnels from the other routers (with the suspected software bug) pointing towards A will still have the old IPv6 link-local next hop of the tunnel endpoints at A.
I’ve just rolled back to known good version “1.4-rolling-202210280218” for now. If I get some spare time I will lab this up and file a bug with VyOS. I don’t know if it’s VyOS bug or FRR bug to be fair. “1.4-rolling-202210280218” uses FRR 8.3.1, the nightly I tried was “vyos-1.4-rolling-202306080317” with FRR 8.5.1
Hope this helps someone.