Post

Wireguard over GRETAP over Wireguard?

The purpose

So… Recently I have been faced with some instability on my wireguard tunnels from NYC1BR1, to the rest of AS203528.

I also have a couple of VMs, “NYC2EV1” and “FFT1EV1” on Linode, part of what was my legacy network where I learned the IPv6 basics (they give out IPv6 /56 upon request).

This network is connected to my internal network by using GRETAP tunnels over AS203528. However both Linodes still have tunnels between each other.

1
2
3
4
fabrizzio@NYC2EV1:~$ sh interfaces | match FFT1
wg4              172.30.238.2/30                   u/u  To FFT1EV1 v4/v6 ***IPv6 Inter-Region***
wg5              172.30.238.6/30                   u/u  To FFT1EV1 v4/v6 ***IPv6 Inter-Region***
wg6              172.30.238.10/30                  u/u  From FFT1EV1 v4/v6 ***IPv6 Inter-Region***

Current setup Current setup

I have noticed that whatever issue BuyVM might have, does not affect the connection towards Linode (between NYC1BR1 and NYC2EV1)… so that gives me the idea to build tunnels…

Dealing with fragmentation and TTL Inheritance

AS203528 is built taking into consideration the tunnels between the routers will have a minimum MTU of 1420 (Wireguard). If I want to build a tunnel with MTU 1420 on the inside, over a path with MTU 1420 then something must fragment.

If you want to allow fragmentation of your GRE tunnel, for some reason you must enable ttl inheritance (ttl 0) whenever you set up no pmtu discovery/ ignore df flags. (I am not entirely sure why)

This causes the weird effect seen in the trace below, the tunnel is not completely transparent, you will not get packets sent inside the tunnel with TTL=1 all the way to the other end.

(snippet from a tunnel I use to remotely manage an LTE modem)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
fabrizzio@OSR1CR4# show interfaces tunnel tun4
 description "OSR1E2 vtnet6 - VLN1 LTE WAN MGMT"
 encapsulation gretap
 mtu 1500
 parameters {
     ip {
         ignore-df
         key 401
         no-pmtu-discovery
         ttl 0
     }
 }
 remote 192.168.255.20
 source-address 192.168.254.13

traceroute to 192.168.51.254 (192.168.51.254), 30 hops max, 60 byte packets
 1  _gateway (192.168.20.1)  0.485 ms  0.432 ms  0.418 ms
 2  172.27.16.73 (172.27.16.73)  0.655 ms  0.633 ms  0.612 ms
 3  172.27.16.17 (172.27.16.17)  0.998 ms  0.978 ms  0.959 ms
 4  172.27.18.66 (172.27.18.66)  1.143 ms  1.178 ms  1.166 ms
 5  * * *
 6  * * *
 7  * * *
 8  * * *
 9  192.168.51.254 (192.168.51.254)  251.629 ms  201.757 ms  201.726 ms

This will be somewhat of a pain if you want to bring up any routing protocols over this type of tunnels.

So what can I do?. Well, when there is a will, there is a way.

First attempt

I will build two GRETAP tunnels, that allow for fragmentation, between NYC1BR1 and AMS1BR1 (through NYC2EV1 and FFT1EV1, static routes created on both of them to make sure recursive routing over AS203528 is impossible).

Then I will build GRE tunnels on top of those GRETAP tunnels. So the outer GRETAP tunnel takes care of fragmentation. And the inner GRE tunnel gives me the untouched TTL with the 1420 MTU I want.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
fabrizzio@NYC1BR1# sh interfaces tunnel tun10
 address 172.27.0.77/30
 description "Wrapper GRE NYC1BR1 (NYC2EV2-FFT1EV1) AMS1BR1 v4"
 encapsulation gretap
 mtu 1500
 parameters {
     ip {
         ignore-df
         key 400
         no-pmtu-discovery
         ttl 0
     }
 }
 remote 172.27.255.5
 source-address 172.27.255.1
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
fabrizzio@NYC1BR1# run ping 172.27.0.78 size 1472 do-not-fragment
PING 172.27.0.78 (172.27.0.78) 1472(1500) bytes of data.
1480 bytes from 172.27.0.78: icmp_seq=1 ttl=64 time=89.4 ms
1480 bytes from 172.27.0.78: icmp_seq=2 ttl=64 time=91.8 ms
1480 bytes from 172.27.0.78: icmp_seq=3 ttl=64 time=89.2 ms
^C
--- 172.27.0.78 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2004ms
rtt min/avg/max/mdev = 89.184/90.133/91.802/1.183 ms
[edit]
fabrizzio@NYC1BR1# run traceroute 172.27.0.78
traceroute to 172.27.0.78 (172.27.0.78), 30 hops max, 60 byte packets
 1  * * *
 2  * * *
 3  172.27.0.78 (172.27.0.78)  88.645 ms  88.651 ms  88.634 ms

Good enough for now! Inner tunnel will take care of the TTL issue :D

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
fabrizzio@NYC1BR1# show interfaces tunnel tun11
 address 172.27.0.81/30
 description "To AMS1BR1 via tun10 wrapper"
 encapsulation gre
 ip {
     adjust-mss 1372
 }
 mtu 1420
 parameters {
     ip {
         key 401
     }
 }
 remote 172.27.0.78
 source-address 172.27.0.77

The results

It doesn’t work. Somehow the inner TTL value gets inherited anyways by the outer GRE tunnel. the packet capture below is from a traceroute, captured at NYC2EV1. First trace probes have TTL=1, the inner tunnel has TTL=255, outer tunnel is somehow using TTL=1.

1
2
3
4
5
6
7
8
9
10
11
15:19:01.853902 ip: (tos 0x0, ttl 1, id 46455, offset 0, flags [none], proto GRE (47), length 144)
    172.27.255.1 > 172.27.255.5: GREv0, Flags [key present], key=0x190, proto TEB (0x6558), length 124
        2e:62:e6:af:07:36 > 6e:7a:1a:84:7a:5d, ethertype IPv4 (0x0800), length 116: (tos 0x0, ttl 255, id 64589, offset 0, flags [DF], proto GRE (47), length 102)
    172.27.0.77 > 172.27.0.78: GREv0, Flags [key present], key=0x191, proto TEB (0x6558), length 82
        9e:71:16:29:bd:df > 96:10:1d:7a:3c:52, ethertype IPv4 (0x0800), length 74: (tos 0x0, ttl 1, id 7341, offset 0, flags [none], proto UDP (17), length 60)

fabrizzio@NYC1BR1:~$ traceroute 172.27.0.82
traceroute to 172.27.0.82 (172.27.0.82), 30 hops max, 60 byte packets
 1  * * *
 2  * * *
p 3  172.27.0.82 (172.27.0.82)  88.887 ms *  88.966 ms

I tried switching the inner tunnel to GRETAP, making the inner GRETAP tunnel a member of a bridge, adding “ttl 255” on the inner tunnel, no luck really in avoiding this TTL inheritance I don’t want.

2nd attempt

Then an idea came to mind :D I should just run Wireguard on top of the outer GRE tunnel, maybe that way it will work and Linux kernel will be unaware of the nasty stuff I am doing.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
tun10            172.27.0.77/30                    u/u  Wrapper GRE NYC1BR1 (NYC2EV2-FFT1EV1) AMS1BR1 v4
wg10             172.27.0.81/30                    u/u

fabrizzio@NYC1BR1# show interfaces wireguard wg10
 address 172.27.0.81/30
 peer AMS1BR1 {
     allowed-ips 0.0.0.0/0
     allowed-ips 172.27.0.80/30
     persistent-keepalive 15
     preshared-key xxxxxxxxxxxx
     public-key yyyyyyyyyyyyyyy
 }
 port 4001
 private-key zzzzzzzzzz

fabrizzio@NYC1BR1:~$ ping 172.27.0.82 size 1392 do-not-fragment count 5 ada
PING 172.27.0.82 (172.27.0.82) 1392(1420) bytes of data.
1400 bytes from 172.27.0.82: icmp_seq=1 ttl=64 time=89.7 ms
1400 bytes from 172.27.0.82: icmp_seq=2 ttl=64 time=89.8 ms
1400 bytes from 172.27.0.82: icmp_seq=3 ttl=64 time=89.7 ms
1400 bytes from 172.27.0.82: icmp_seq=4 ttl=64 time=88.2 ms
1400 bytes from 172.27.0.82: icmp_seq=5 ttl=64 time=88.9 ms

--- 172.27.0.82 ping statistics ---
5 packets transmitted, 5 received, 0% packet loss, time 360ms
rtt min/avg/max/mdev = 88.234/89.266/89.819/0.614 ms, ipg/ewma 90.057/89.442 ms
fabrizzio@NYC1BR1:~$ traceroute 172.27.0.82
traceroute to 172.27.0.82 (172.27.0.82), 30 hops max, 60 byte packets
 1  172.27.0.82 (172.27.0.82)  88.648 ms  88.607 ms  88.590 ms

It works!

OSPF also establishes an adjacency just fine.

1
2
3
4
5
6
7
8
9
fabrizzio@NYC1BR1# show protocols ospf interface wg10
 area 0
 bfd {
 }
 cost 8510
 network point-to-point

fabrizzio@NYC1BR1:~$ sh ip ospf nei | match wg10
192.168.248.10    1 Full/-          58.151s           36.226s 172.27.0.82     wg10:172.27.0.81                     0     0     0

Performance

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
iperf3: the client has terminated
-----------------------------------------------------------
Server listening on 5201
-----------------------------------------------------------
Accepted connection from 172.27.0.81, port 41141
[  5] local 172.27.0.82 port 5201 connected to 172.27.0.81 port 40899
[ ID] Interval           Transfer     Bitrate         Jitter    Lost/Total Datagrams
[  5]   0.00-1.00   sec  5.19 MBytes  43.6 Mbits/sec  0.070 ms  0/3980 (0%)
[  5]   1.00-2.00   sec  5.95 MBytes  49.9 Mbits/sec  0.083 ms  2/4564 (0.044%)
[  5]   2.00-3.00   sec  5.96 MBytes  50.0 Mbits/sec  0.088 ms  0/4570 (0%)
[  5]   3.00-4.00   sec  5.20 MBytes  43.5 Mbits/sec  4.730 ms  9/3994 (0.23%)
[  5]   4.00-5.00   sec  6.41 MBytes  53.9 Mbits/sec  0.112 ms  237/5147 (4.6%)
[  5]   5.00-6.00   sec  5.96 MBytes  50.0 Mbits/sec  0.105 ms  4/4569 (0.088%)
[  5]   6.00-7.00   sec  5.95 MBytes  49.9 Mbits/sec  0.061 ms  1/4559 (0.022%)
[  5]   7.00-8.00   sec  5.96 MBytes  50.0 Mbits/sec  0.105 ms  5/4575 (0.11%)
[  5]   8.00-9.00   sec  5.88 MBytes  49.3 Mbits/sec  0.132 ms  0/4509 (0%)
[  5]   9.00-10.00  sec  6.03 MBytes  50.6 Mbits/sec  0.117 ms  7/4627 (0.15%)
[  5]  10.00-10.13  sec   788 KBytes  50.2 Mbits/sec  0.070 ms  0/590 (0%)
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Jitter    Lost/Total Datagrams
[SUM]  0.0-10.1 sec  51 datagrams received out-of-order
[  5]   0.00-10.13  sec  59.3 MBytes  49.1 Mbits/sec  0.070 ms  265/45684 (0.58%)  receiver
-----------------------------------------------------------
Server listening on 5201
-----------------------------------------------------------
Accepted connection from 172.27.0.81, port 38943
[  5] local 172.27.0.82 port 5201 connected to 172.27.0.81 port 51093
[ ID] Interval           Transfer     Bitrate         Total Datagrams
[  5]   0.00-1.00   sec  5.96 MBytes  50.0 Mbits/sec  4567
[  5]   1.00-2.00   sec  5.96 MBytes  50.0 Mbits/sec  4569
[  5]   2.00-3.00   sec  5.96 MBytes  50.0 Mbits/sec  4567
[  5]   3.00-4.00   sec  5.96 MBytes  50.0 Mbits/sec  4569
[  5]   4.00-5.00   sec  5.96 MBytes  50.0 Mbits/sec  4569
[  5]   5.00-6.00   sec  5.96 MBytes  50.0 Mbits/sec  4570
[  5]   6.00-7.00   sec  5.96 MBytes  50.0 Mbits/sec  4567
[  5]   7.00-8.00   sec  5.96 MBytes  50.0 Mbits/sec  4568
[  5]   8.00-9.00   sec  5.96 MBytes  50.0 Mbits/sec  4569
[  5]   9.00-10.00  sec  5.96 MBytes  50.0 Mbits/sec  4569
[  5]  10.00-10.21  sec  1.23 MBytes  50.1 Mbits/sec  941
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Jitter    Lost/Total Datagrams
[  5]   0.00-10.21  sec  60.8 MBytes  50.0 Mbits/sec  0.000 ms  0/46625 (0%)  sender
-----------------------------------------------------------
Server listening on 5201

fabrizzio@NYC1BR1:~$ iperf3 -c 172.27.0.82 -B 172.27.0.81 -u -b50M
Connecting to host 172.27.0.82, port 5201
[  5] local 172.27.0.81 port 40899 connected to 172.27.0.82 port 5201
[ ID] Interval           Transfer     Bitrate         Total Datagrams
[  5]   0.00-1.00   sec  5.96 MBytes  50.0 Mbits/sec  4565
[  5]   1.00-2.00   sec  5.96 MBytes  50.0 Mbits/sec  4569
[  5]   2.00-3.00   sec  5.96 MBytes  50.0 Mbits/sec  4569
[  5]   3.00-4.00   sec  5.96 MBytes  50.0 Mbits/sec  4568
[  5]   4.00-5.00   sec  5.96 MBytes  50.0 Mbits/sec  4569
[  5]   5.00-6.00   sec  5.96 MBytes  50.0 Mbits/sec  4569
[  5]   6.00-7.00   sec  5.96 MBytes  50.0 Mbits/sec  4569
[  5]   7.00-8.00   sec  5.96 MBytes  50.0 Mbits/sec  4572
[  5]   8.00-9.00   sec  5.96 MBytes  50.0 Mbits/sec  4565
[  5]   9.00-10.00  sec  5.96 MBytes  50.0 Mbits/sec  4569
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Jitter    Lost/Total Datagrams
[  5]   0.00-10.00  sec  59.6 MBytes  50.0 Mbits/sec  0.000 ms  0/45684 (0%)  sender
[  5]   0.00-10.13  sec  59.3 MBytes  49.1 Mbits/sec  0.070 ms  265/45684 (0.58%)  receiver

iperf Done.
fabrizzio@NYC1BR1:~$ iperf3 -c 172.27.0.82 -B 172.27.0.81 -u -b50M -R
Connecting to host 172.27.0.82, port 5201
Reverse mode, remote host 172.27.0.82 is sending
[  5] local 172.27.0.81 port 51093 connected to 172.27.0.82 port 5201
[ ID] Interval           Transfer     Bitrate         Jitter    Lost/Total Datagrams
[  5]   0.00-1.00   sec  6.09 MBytes  51.1 Mbits/sec  0.210 ms  433/5100 (8.5%)
[  5]   1.00-2.00   sec  5.93 MBytes  49.7 Mbits/sec  0.100 ms  24/4569 (0.53%)
[  5]   2.00-3.00   sec  5.87 MBytes  49.2 Mbits/sec  0.147 ms  54/4551 (1.2%)
[  5]   3.00-4.00   sec  5.94 MBytes  49.9 Mbits/sec  0.230 ms  28/4583 (0.61%)
[  5]   4.00-5.00   sec  5.94 MBytes  49.8 Mbits/sec  0.243 ms  22/4573 (0.48%)
[  5]   5.00-6.00   sec  5.96 MBytes  50.0 Mbits/sec  0.184 ms  2/4572 (0.044%)
[  5]   6.00-7.00   sec  5.93 MBytes  49.7 Mbits/sec  0.117 ms  26/4569 (0.57%)
[  5]   7.00-8.00   sec  5.90 MBytes  49.5 Mbits/sec  0.136 ms  43/4569 (0.94%)
[  5]   8.00-9.00   sec  5.86 MBytes  49.2 Mbits/sec  0.086 ms  73/4568 (1.6%)
[  5]   9.00-10.00  sec  5.86 MBytes  49.1 Mbits/sec  0.167 ms  76/4565 (1.7%)
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Jitter    Lost/Total Datagrams
[  5]   0.00-10.21  sec  60.8 MBytes  50.0 Mbits/sec  0.000 ms  0/46625 (0%)  sender
[  5]   0.00-10.00  sec  59.3 MBytes  49.7 Mbits/sec  0.167 ms  781/46219 (1.7%)  receiver

I think the performance is more than enough for this abomination. Most I could get from TCP testing was ~110 Mbit.

New setup New setup

This post is licensed under CC BY 4.0 by the author.

Trending Tags