Opened 3 years ago

Last modified 2 years ago

#1389 new Bug / Defect

OpenVPN v2.5.1 --fragment does not work

Reported by: alexvelkov Owned by:
Priority: major Milestone: release 2.5.3
Component: Generic / unclassified Version: OpenVPN 2.5.0 (Community Ed)
Severity: Not set (select this one, unless your'e a OpenVPN developer) Keywords: --fragment
Cc: tct, plaisthos

Description

Hi everybody,

I have a running OpenVPN TUN tunnel between two boxes configured with the --fragment option on both ends. One side (Box1) is running an OpenVPN v2.3.2 and the other (Box2) is running a newer version. I have tried running v2.4.9 and v2.5.1 on Box2.

I see a different behaviour of --fragment on Box2 when running v2.4.9 and v2.5.1. It seems as though v2.5.1 is not handling fragment right.

Box1 config:

# openvpn --version
OpenVPN 2.3.2 x86_64-pc-linux-gnu [SSL (OpenSSL)] [LZO] [EPOLL] [PKCS11] [eurephia] [MH] [IPv6] built on Dec  1 2014
..

## OpenVPN config part:
# openvpn --proto udp --dev tun --fragment 1210 ...

Box2 config:

# openvpn --version
OpenVPN 2.5.1 x86_64-pc-linux-gnu [SSL (OpenSSL)] [LZO] [LZ4] [EPOLL] [AEAD] built on Mar  4 2021
..

## OpenVPN config part:
# openvpn --proto udp --dev tun --fragment 1210 ...

I ping through the tunnel from Box1 (OVPN 10.5.0.1) to Box2 (OVPN 10.5.0.2)

# ping -M do -s 1400 -c 3 10.5.0.2
PING 10.5.0.2 (10.5.0.2) 1400(1428) bytes of data.
1408 bytes from 10.5.0.2: icmp_seq=1 ttl=64 time=2.30 ms
1408 bytes from 10.5.0.2: icmp_seq=2 ttl=64 time=2.33 ms
1408 bytes from 10.5.0.2: icmp_seq=3 ttl=64 time=2.30 ms

--- 10.5.0.2 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2003ms
rtt min/avg/max/mdev = 2.302/2.313/2.335/0.042 ms

... and I check the size of the packets on the interface where the OpenVPN traffic flows with tcpdump on Box1

# tcpdump -i eth1 -nnnvvv port 1194
tcpdump: listening on eth1, link-type EN10MB (Ethernet), capture size 65535 bytes


14:30:56.304008 IP (tos 0x0, ttl 64, id 40423, offset 0, flags [DF], proto UDP (17), length 109)
    172.16.0.9.1194 > 172.16.0.3.1194: [bad udp cksum 0x5897 -> 0x019d!] UDP, length 81
14:30:57.069682 IP (tos 0x0, ttl 64, id 40595, offset 0, flags [DF], proto UDP (17), length 813)
    172.16.0.9.1194 > 172.16.0.3.1194: [bad udp cksum 0x5b57 -> 0x1d73!] UDP, length 785
14:30:57.069790 IP (tos 0x0, ttl 64, id 40596, offset 0, flags [DF], proto UDP (17), length 813)
    172.16.0.9.1194 > 172.16.0.3.1194: [bad udp cksum 0x5b57 -> 0x5982!] UDP, length 785
14:30:57.071626 IP (tos 0x0, ttl 64, id 6423, offset 0, flags [+], proto UDP (17), length 1500)
    172.16.0.3.1194 > 172.16.0.9.1194: UDP, length 1489
14:30:58.071669 IP (tos 0x0, ttl 64, id 40692, offset 0, flags [DF], proto UDP (17), length 813)
    172.16.0.9.1194 > 172.16.0.3.1194: [bad udp cksum 0x5b57 -> 0x25af!] UDP, length 785
14:30:58.071780 IP (tos 0x0, ttl 64, id 40693, offset 0, flags [DF], proto UDP (17), length 813)
    172.16.0.9.1194 > 172.16.0.3.1194: [bad udp cksum 0x5b57 -> 0x2f35!] UDP, length 785
14:30:58.073553 IP (tos 0x0, ttl 64, id 6519, offset 0, flags [+], proto UDP (17), length 1500)
    172.16.0.3.1194 > 172.16.0.9.1194: UDP, length 1489
14:30:59.073357 IP (tos 0x0, ttl 64, id 40928, offset 0, flags [DF], proto UDP (17), length 813)
    172.16.0.9.1194 > 172.16.0.3.1194: [bad udp cksum 0x5b57 -> 0xf769!] UDP, length 785
14:30:59.073465 IP (tos 0x0, ttl 64, id 40929, offset 0, flags [DF], proto UDP (17), length 813)
    172.16.0.9.1194 > 172.16.0.3.1194: [bad udp cksum 0x5b57 -> 0x6442!] UDP, length 785
14:30:59.075222 IP (tos 0x0, ttl 64, id 6603, offset 0, flags [+], proto UDP (17), length 1500)
    172.16.0.3.1194 > 172.16.0.9.1194: UDP, length 1489
^C
10 packets captured
10 packets received by filter
0 packets dropped by kernel

So, I see packets originating from Box1 (172.16.0.9) to Box2 (172.16.0.3) which are fragmented as expected in two packets with length 785 bytes. However, the packets from Box2 are NOT fragmented at all, with a length of 1489 bytes!

If I use OpenVPN v2.4.9 on Box2 and repeat the same steps, I get to see fragmented packets coming from both ends:

06:15:54.765351 IP (tos 0x0, ttl 64, id 57831, offset 0, flags [DF], proto UDP (17), length 813)
    172.16.0.9.1194 > 172.16.0.3.1194: [bad udp cksum 0x5b57 -> 0xd9cd!] UDP, length 785
06:15:54.765466 IP (tos 0x0, ttl 64, id 57832, offset 0, flags [DF], proto UDP (17), length 813)
    172.16.0.9.1194 > 172.16.0.3.1194: [bad udp cksum 0x5b57 -> 0x952a!] UDP, length 785
06:15:54.767943 IP (tos 0x0, ttl 64, id 2973, offset 0, flags [DF], proto UDP (17), length 813)
    172.16.0.3.1194 > 172.16.0.9.1194: [udp sum ok] UDP, length 785
06:15:54.767998 IP (tos 0x0, ttl 64, id 2974, offset 0, flags [DF], proto UDP (17), length 813)
    172.16.0.3.1194 > 172.16.0.9.1194: [udp sum ok] UDP, length 785
06:15:55.766768 IP (tos 0x0, ttl 64, id 57952, offset 0, flags [DF], proto UDP (17), length 813)
    172.16.0.9.1194 > 172.16.0.3.1194: [bad udp cksum 0x5b57 -> 0x9a31!] UDP, length 785
06:15:55.766877 IP (tos 0x0, ttl 64, id 57953, offset 0, flags [DF], proto UDP (17), length 813)
    172.16.0.9.1194 > 172.16.0.3.1194: [bad udp cksum 0x5b57 -> 0xf09b!] UDP, length 785
06:15:55.768896 IP (tos 0x0, ttl 64, id 2980, offset 0, flags [DF], proto UDP (17), length 813)
    172.16.0.3.1194 > 172.16.0.9.1194: [udp sum ok] UDP, length 785
06:15:55.768974 IP (tos 0x0, ttl 64, id 2981, offset 0, flags [DF], proto UDP (17), length 813)
    172.16.0.3.1194 > 172.16.0.9.1194: [udp sum ok] UDP, length 785
06:15:56.768946 IP (tos 0x0, ttl 64, id 58047, offset 0, flags [DF], proto UDP (17), length 813)
    172.16.0.9.1194 > 172.16.0.3.1194: [bad udp cksum 0x5b57 -> 0xb52c!] UDP, length 785
06:15:56.769054 IP (tos 0x0, ttl 64, id 58048, offset 0, flags [DF], proto UDP (17), length 813)
    172.16.0.9.1194 > 172.16.0.3.1194: [bad udp cksum 0x5b57 -> 0xf280!] UDP, length 785
06:15:56.770776 IP (tos 0x0, ttl 64, id 3078, offset 0, flags [DF], proto UDP (17), length 813)
    172.16.0.3.1194 > 172.16.0.9.1194: [udp sum ok] UDP, length 785
06:15:56.770820 IP (tos 0x0, ttl 64, id 3079, offset 0, flags [DF], proto UDP (17), length 813)
    172.16.0.3.1194 > 172.16.0.9.1194: [udp sum ok] UDP, length 785

Thanks for any help!

Alex

Change History (16)

comment:1 Changed 3 years ago by alexvelkov

From the Dokumentation for "--fragment max":

Enable internal datagram fragmentation so that no UDP datagrams are sent which are larger than max bytes.
..

comment:2 Changed 3 years ago by tct

Cc: tct added

comment:3 Changed 3 years ago by Gert Döring

Cc: plaisthos added

Mmmmh. Interesting find.

Our regression tests verify that client + server can ping each other if --fragment is in use, but do not automatically test that packet size is not exceeded (to set that up without interfering with other tests on said machines is tricky, and I didn't think I would need it).

Need to test this. Maybe we broke it for 2.5.x

(Interesting enough it *does* use proper --fragment encapsulation - otherwise pings wouldn't work at all)

comment:4 Changed 3 years ago by plaisthos

can you test if ncp-disable on both sides makes a difference in behaviour? This might a side effect of our sub optimal frame recalculation on NCP.

comment:5 Changed 3 years ago by tct

FTR, it may be suitable to use --tun-mtu (a.k.a --udp-mtu) because --fragment is truly broken.

comment:6 Changed 3 years ago by alexvelkov

Hi,

thanks for all the responses!

  • @plaisthos: adding "ncp-disable" in the OpenVPN v2.5.1 peer config makes no difference, the packets are still not fragmented. "ncp-disable" is not available in OpenVPN v2.3.2 so I left it out on that peer for the test.
  • @tincantech: the thing is that problems may arise with setups using the "fragment" param. If "fragment" is broken, then this could be communicated as a known-bug. Yes, using "tun-mtu" changes the size of the OpenVPN packets, although in my test I cannot configure it below the "don't fragment" value else the pings don't get replied.
  • @Gert Döring: right, this means that even if a solution with "tun-mtu" exists, "fragment" eventually needs to be additionally added to the config for the workaround to work, somehow pretty messy :(.

Best regards
Alex

comment:7 in reply to:  6 ; Changed 3 years ago by Gert Döring

Replying to alexvelkov:

  • If "fragment" is broken, then this could be communicated as a known-bug. Yes, using "tun-mtu" changes the size of the OpenVPN packets, although in my test I cannot configure it below the "don't fragment" value else the pings don't get replied.

It was not yet "known broken". It seems it is, in 2.5.0 and 2.5.1, but we didn't have time to investigate the "why" part yet.

Lots of code has been cleaned up and old code rewritten in the 3 years between 2.4.0 and 2.5.0, and it seems we overlooked something, and broke --fragment. There is a pretty nasty area of the code that calculates cipher overhead (so "if the inner packet is <x> byte, how big will the resulting UDP packet with all the overhead will be?"), and this is what plaisthos was hinting at. --ncp-disable is only relevant on the 2.5.x side - but if it doesn't change anything, the problem is elsewhere.

gert

comment:8 in reply to:  7 Changed 3 years ago by alexvelkov

Replying to Gert Döring:

Replying to alexvelkov:

  • If "fragment" is broken, then this could be communicated as a known-bug. Yes, using "tun-mtu" changes the size of the OpenVPN packets, although in my test I cannot configure it below the "don't fragment" value else the pings don't get replied.

It was not yet "known broken". It seems it is, in 2.5.0 and 2.5.1, but we didn't have time to investigate the "why" part yet.

Yes of course, no offense at all :).

Alex

comment:9 in reply to:  7 ; Changed 3 years ago by tct

Replying to Gert Döring:

Replying to alexvelkov:
It was not yet "known broken". It seems it is, in 2.5.0 and 2.5.1, but we didn't have time to investigate the "why" part yet.

Lots of code has been cleaned up and old code rewritten in the 3 years between 2.4.0 and 2.5.0, and it seems we overlooked something, and broke --fragment. There is a pretty nasty area of the code that calculates cipher overhead (so "if the inner packet is <x> byte, how big will the resulting UDP packet with all the overhead will be?"), and this is what plaisthos was hinting at. --ncp-disable is only relevant on the 2.5.x side - but if it doesn't change anything, the problem is elsewhere.

I have done a little more testing and I don't believe --fragment is as broken as I first thought.

It is difficult to use but once setup correctly it does seem to work.

I have a test rig which I tried:

fragment 1300
mssfix

Using a fully configured TLS Server and client git/master (more-or-less), downloading (http) over the VPN, tcpdump showed:

HTTP (Inside tunnel):

01:30:05.171537 IP 10.127.121.1.80 > 10.127.121.6.59996: Flags [.], seq 22576568:22577759, ack 164, win 508, options [nop,nop,TS val 2777937165 ecr 2503695730], length 1191: HTTP

UDP (Outside tunnel):

15:52:12.302476 IP 10.10.101.101.34571 > 10.10.201.226.37448: UDP, length 1272
  • This test is invalid because both ends were using git/master

It may simply be the difficulty of using these options correctly which causes issues.

The manual could clarify that: --fragment is required on both Server and Client, if it is required at all.

Additionally, --fragment could explicitly set --mssfix correctly.

Last edited 3 years ago by tct (previous) (diff)

comment:10 in reply to:  9 Changed 3 years ago by Gert Döring

Replying to tincantech:

Additionally, --fragment could explicitly set --mssfix correctly.

--fragment and --mssfix are independent settings and serve different purpose.

If you have --mssfix set low-enough and do not have to care for large UDP packets, there is no need for --fragment anymore, as payload packets are already small enough so no fragmentation is needed.

comment:11 Changed 3 years ago by Gert Döring

Milestone: release 2.5release 2.5.3

comment:12 Changed 3 years ago by neoj

I am experiencing this issue too. For me I had to go back as far as 2.4.8 for it to behave as expected. 3bd91cd / d22ba6b the likely cause?

comment:13 Changed 3 years ago by tct

After reviewing this ticket it must be noted:

One side (Box1) is running an OpenVPN v2.3.2

V2.3 is End of Life as of June 2021:
https://community.openvpn.net/openvpn/wiki/SupportedVersions

comment:14 in reply to:  13 Changed 3 years ago by neoj

Replying to tincantech:

After reviewing this ticket it must be noted:

One side (Box1) is running an OpenVPN v2.3.2

V2.3 is End of Life as of June 2021:
https://community.openvpn.net/openvpn/wiki/SupportedVersions

In my setup I am in control of both ends of the tunnel and I have reproduced this behavior without 2.3. Within 2.4 I have to go back to 2.4.8 for fragmentation to work correctly. And within 2.5 fragmentation does not work in any version.

comment:15 Changed 2 years ago by Gert Döring

So, intermediate update, and some more test results:

  • for controlled testing, ICMP is good, HTTP + --mssfix is bad, as it will hide non-working fragmentation behind TCP MSS reduction
  • In master (which will become 2.6), --fragment does work - but the whole code path related to struct frame has been completely rewritten, as it is a horrid mess nobody understood anymore.
  • In 2.5.5 (should apply equally to earlier 2.5 versions), if I test with --fragment 500, and ping with varying packet sizes, I can see that the overhad calculation is a bit off and it will actually produce smaller(!) packets

16:03:23.464864 193.149.48.178.51198 > 199.102.77.82.51198: UDP, length 472
16:03:23.464897 193.149.48.178.51198 > 199.102.77.82.51198: UDP, length 472

("ping -s 859" - going to "ping -s 860" will produce 3 fragments, so that's the largest we'll see)

Trying with --fragment 1210 and then increasing ping sizes +1, I can see the jump from "one packet" to "two packets" at ping -s 1128:

16:07:23.493823 193.149.48.178.51198 > 199.102.77.82.51198: UDP, length 1183
16:07:25.959658 193.149.48.178.51198 > 199.102.77.82.51198: UDP, length 1184
16:07:29.959678 193.149.48.178.51198 > 199.102.77.82.51198: UDP, length 1185
16:07:33.119763 193.149.48.178.51198 > 199.102.77.82.51198: UDP, length 608
16:07:33.119788 193.149.48.178.51198 > 199.102.77.82.51198: UDP, length 606

So, what I can conclude from here is "--fragment in 2.5.x is not generally broken".

It might be due to cipher negotiation (= cipher changing from what is in the config, or cipher actually *not* changing)...

Ah, indeed.

If I connect with a 2.5.5 client that has no --cipher in its config (default: BF-CBC) to a sufficiently recent server that it can do cipher negotiation, the session will use AES-256-GCM, and fragment works.

If I configure --cipher AES-256-GCM on the client, it will "negotiate" the same cipher, and end up using AES-256-GCM. If that happens, I see

193.149.48.178.51198 > 199.102.77.82.51198: UDP, length 1197
193.149.48.178.51198 > 199.102.77.82.51198: UDP, length 1207
193.149.48.178.51198 > 199.102.77.82.51198: UDP, length 1217
193.149.48.178.51198 > 199.102.77.82.51198: UDP, length 1227
193.149.48.178.51198 > 199.102.77.82.51198: UDP, length 1237
193.149.48.178.51198 > 199.102.77.82.51198: UDP, length 1247
193.149.48.178.51198 > 199.102.77.82.51198: UDP, length 1257
193.149.48.178.51198 > 199.102.77.82.51198: UDP, length 1267
193.149.48.178.51198 > 199.102.77.82.51198: UDP, length 1277
193.149.48.178.51198 > 199.102.77.82.51198: UDP, length 1377

... very much unfragmented packets.

So, @alexvelkov, currently your best bet is to upgrade the "Box1" side to 2.4 or higher, enabling cipher negotiation from the 2.5 client. This is desirable anyway, as BF-CBC is no longer considered really secure - and AES-256-GCM is faster as well. Note: cipher negotiation will only happen if you use TLS, and one side is using --client and the other side uses --server. 2.6 will do cipher negotiations using --secret as well.

As for "can this be fixed in 2.5.x", not sure. It's a nasty code path, which got fully rewritten for 2.6, because it's very hard to understand.

comment:16 Changed 2 years ago by alexvelkov

Hi Gert,

thank you for the information.

If you say that the code path got fully rewritten in v2.6, will the newest version behave like v2.4.x regarding the fragment functionality? Is there a timeline for the 2.6 release?

I am still concerned about the versions compatibility. Updating Box1 to a newer version is definitely a good idea, since the OpenVPN v2.3.x is now no longer supported. I tested a similar setup with updating "Box1" to 2.4.11, and a client running the latest 2.5.5, fragmenting works in both directions.

Note: See TracTickets for help on using tickets.