Opened 9 months ago

Last modified 5 months ago

#1389 new Bug / Defect

OpenVPN v2.5.1 --fragment does not work

Reported by: alexvelkov Owned by:
Priority: major Milestone: release 2.5.3
Component: Generic / unclassified Version: OpenVPN 2.5.0 (Community Ed)
Severity: Not set (select this one, unless your'e a OpenVPN developer) Keywords: --fragment
Cc: tct, plaisthos

Description

Hi everybody,

I have a running OpenVPN TUN tunnel between two boxes configured with the --fragment option on both ends. One side (Box1) is running an OpenVPN v2.3.2 and the other (Box2) is running a newer version. I have tried running v2.4.9 and v2.5.1 on Box2.

I see a different behaviour of --fragment on Box2 when running v2.4.9 and v2.5.1. It seems as though v2.5.1 is not handling fragment right.

Box1 config:

# openvpn --version
OpenVPN 2.3.2 x86_64-pc-linux-gnu [SSL (OpenSSL)] [LZO] [EPOLL] [PKCS11] [eurephia] [MH] [IPv6] built on Dec  1 2014
..

## OpenVPN config part:
# openvpn --proto udp --dev tun --fragment 1210 ...

Box2 config:

# openvpn --version
OpenVPN 2.5.1 x86_64-pc-linux-gnu [SSL (OpenSSL)] [LZO] [LZ4] [EPOLL] [AEAD] built on Mar  4 2021
..

## OpenVPN config part:
# openvpn --proto udp --dev tun --fragment 1210 ...

I ping through the tunnel from Box1 (OVPN 10.5.0.1) to Box2 (OVPN 10.5.0.2)

# ping -M do -s 1400 -c 3 10.5.0.2
PING 10.5.0.2 (10.5.0.2) 1400(1428) bytes of data.
1408 bytes from 10.5.0.2: icmp_seq=1 ttl=64 time=2.30 ms
1408 bytes from 10.5.0.2: icmp_seq=2 ttl=64 time=2.33 ms
1408 bytes from 10.5.0.2: icmp_seq=3 ttl=64 time=2.30 ms

--- 10.5.0.2 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2003ms
rtt min/avg/max/mdev = 2.302/2.313/2.335/0.042 ms

... and I check the size of the packets on the interface where the OpenVPN traffic flows with tcpdump on Box1

# tcpdump -i eth1 -nnnvvv port 1194
tcpdump: listening on eth1, link-type EN10MB (Ethernet), capture size 65535 bytes


14:30:56.304008 IP (tos 0x0, ttl 64, id 40423, offset 0, flags [DF], proto UDP (17), length 109)
    172.16.0.9.1194 > 172.16.0.3.1194: [bad udp cksum 0x5897 -> 0x019d!] UDP, length 81
14:30:57.069682 IP (tos 0x0, ttl 64, id 40595, offset 0, flags [DF], proto UDP (17), length 813)
    172.16.0.9.1194 > 172.16.0.3.1194: [bad udp cksum 0x5b57 -> 0x1d73!] UDP, length 785
14:30:57.069790 IP (tos 0x0, ttl 64, id 40596, offset 0, flags [DF], proto UDP (17), length 813)
    172.16.0.9.1194 > 172.16.0.3.1194: [bad udp cksum 0x5b57 -> 0x5982!] UDP, length 785
14:30:57.071626 IP (tos 0x0, ttl 64, id 6423, offset 0, flags [+], proto UDP (17), length 1500)
    172.16.0.3.1194 > 172.16.0.9.1194: UDP, length 1489
14:30:58.071669 IP (tos 0x0, ttl 64, id 40692, offset 0, flags [DF], proto UDP (17), length 813)
    172.16.0.9.1194 > 172.16.0.3.1194: [bad udp cksum 0x5b57 -> 0x25af!] UDP, length 785
14:30:58.071780 IP (tos 0x0, ttl 64, id 40693, offset 0, flags [DF], proto UDP (17), length 813)
    172.16.0.9.1194 > 172.16.0.3.1194: [bad udp cksum 0x5b57 -> 0x2f35!] UDP, length 785
14:30:58.073553 IP (tos 0x0, ttl 64, id 6519, offset 0, flags [+], proto UDP (17), length 1500)
    172.16.0.3.1194 > 172.16.0.9.1194: UDP, length 1489
14:30:59.073357 IP (tos 0x0, ttl 64, id 40928, offset 0, flags [DF], proto UDP (17), length 813)
    172.16.0.9.1194 > 172.16.0.3.1194: [bad udp cksum 0x5b57 -> 0xf769!] UDP, length 785
14:30:59.073465 IP (tos 0x0, ttl 64, id 40929, offset 0, flags [DF], proto UDP (17), length 813)
    172.16.0.9.1194 > 172.16.0.3.1194: [bad udp cksum 0x5b57 -> 0x6442!] UDP, length 785
14:30:59.075222 IP (tos 0x0, ttl 64, id 6603, offset 0, flags [+], proto UDP (17), length 1500)
    172.16.0.3.1194 > 172.16.0.9.1194: UDP, length 1489
^C
10 packets captured
10 packets received by filter
0 packets dropped by kernel

So, I see packets originating from Box1 (172.16.0.9) to Box2 (172.16.0.3) which are fragmented as expected in two packets with length 785 bytes. However, the packets from Box2 are NOT fragmented at all, with a length of 1489 bytes!

If I use OpenVPN v2.4.9 on Box2 and repeat the same steps, I get to see fragmented packets coming from both ends:

06:15:54.765351 IP (tos 0x0, ttl 64, id 57831, offset 0, flags [DF], proto UDP (17), length 813)
    172.16.0.9.1194 > 172.16.0.3.1194: [bad udp cksum 0x5b57 -> 0xd9cd!] UDP, length 785
06:15:54.765466 IP (tos 0x0, ttl 64, id 57832, offset 0, flags [DF], proto UDP (17), length 813)
    172.16.0.9.1194 > 172.16.0.3.1194: [bad udp cksum 0x5b57 -> 0x952a!] UDP, length 785
06:15:54.767943 IP (tos 0x0, ttl 64, id 2973, offset 0, flags [DF], proto UDP (17), length 813)
    172.16.0.3.1194 > 172.16.0.9.1194: [udp sum ok] UDP, length 785
06:15:54.767998 IP (tos 0x0, ttl 64, id 2974, offset 0, flags [DF], proto UDP (17), length 813)
    172.16.0.3.1194 > 172.16.0.9.1194: [udp sum ok] UDP, length 785
06:15:55.766768 IP (tos 0x0, ttl 64, id 57952, offset 0, flags [DF], proto UDP (17), length 813)
    172.16.0.9.1194 > 172.16.0.3.1194: [bad udp cksum 0x5b57 -> 0x9a31!] UDP, length 785
06:15:55.766877 IP (tos 0x0, ttl 64, id 57953, offset 0, flags [DF], proto UDP (17), length 813)
    172.16.0.9.1194 > 172.16.0.3.1194: [bad udp cksum 0x5b57 -> 0xf09b!] UDP, length 785
06:15:55.768896 IP (tos 0x0, ttl 64, id 2980, offset 0, flags [DF], proto UDP (17), length 813)
    172.16.0.3.1194 > 172.16.0.9.1194: [udp sum ok] UDP, length 785
06:15:55.768974 IP (tos 0x0, ttl 64, id 2981, offset 0, flags [DF], proto UDP (17), length 813)
    172.16.0.3.1194 > 172.16.0.9.1194: [udp sum ok] UDP, length 785
06:15:56.768946 IP (tos 0x0, ttl 64, id 58047, offset 0, flags [DF], proto UDP (17), length 813)
    172.16.0.9.1194 > 172.16.0.3.1194: [bad udp cksum 0x5b57 -> 0xb52c!] UDP, length 785
06:15:56.769054 IP (tos 0x0, ttl 64, id 58048, offset 0, flags [DF], proto UDP (17), length 813)
    172.16.0.9.1194 > 172.16.0.3.1194: [bad udp cksum 0x5b57 -> 0xf280!] UDP, length 785
06:15:56.770776 IP (tos 0x0, ttl 64, id 3078, offset 0, flags [DF], proto UDP (17), length 813)
    172.16.0.3.1194 > 172.16.0.9.1194: [udp sum ok] UDP, length 785
06:15:56.770820 IP (tos 0x0, ttl 64, id 3079, offset 0, flags [DF], proto UDP (17), length 813)
    172.16.0.3.1194 > 172.16.0.9.1194: [udp sum ok] UDP, length 785

Thanks for any help!

Alex

Change History (14)

comment:1 Changed 9 months ago by alexvelkov

From the Dokumentation for "--fragment max":

Enable internal datagram fragmentation so that no UDP datagrams are sent which are larger than max bytes.
..

comment:2 Changed 9 months ago by tct

Cc: tct added

comment:3 Changed 9 months ago by Gert Döring

Cc: plaisthos added

Mmmmh. Interesting find.

Our regression tests verify that client + server can ping each other if --fragment is in use, but do not automatically test that packet size is not exceeded (to set that up without interfering with other tests on said machines is tricky, and I didn't think I would need it).

Need to test this. Maybe we broke it for 2.5.x

(Interesting enough it *does* use proper --fragment encapsulation - otherwise pings wouldn't work at all)

comment:4 Changed 9 months ago by plaisthos

can you test if ncp-disable on both sides makes a difference in behaviour? This might a side effect of our sub optimal frame recalculation on NCP.

comment:5 Changed 9 months ago by tct

FTR, it may be suitable to use --tun-mtu (a.k.a --udp-mtu) because --fragment is truly broken.

comment:6 Changed 9 months ago by alexvelkov

Hi,

thanks for all the responses!

  • @plaisthos: adding "ncp-disable" in the OpenVPN v2.5.1 peer config makes no difference, the packets are still not fragmented. "ncp-disable" is not available in OpenVPN v2.3.2 so I left it out on that peer for the test.
  • @tincantech: the thing is that problems may arise with setups using the "fragment" param. If "fragment" is broken, then this could be communicated as a known-bug. Yes, using "tun-mtu" changes the size of the OpenVPN packets, although in my test I cannot configure it below the "don't fragment" value else the pings don't get replied.
  • @Gert Döring: right, this means that even if a solution with "tun-mtu" exists, "fragment" eventually needs to be additionally added to the config for the workaround to work, somehow pretty messy :(.

Best regards
Alex

comment:7 in reply to:  6 ; Changed 9 months ago by Gert Döring

Replying to alexvelkov:

  • If "fragment" is broken, then this could be communicated as a known-bug. Yes, using "tun-mtu" changes the size of the OpenVPN packets, although in my test I cannot configure it below the "don't fragment" value else the pings don't get replied.

It was not yet "known broken". It seems it is, in 2.5.0 and 2.5.1, but we didn't have time to investigate the "why" part yet.

Lots of code has been cleaned up and old code rewritten in the 3 years between 2.4.0 and 2.5.0, and it seems we overlooked something, and broke --fragment. There is a pretty nasty area of the code that calculates cipher overhead (so "if the inner packet is <x> byte, how big will the resulting UDP packet with all the overhead will be?"), and this is what plaisthos was hinting at. --ncp-disable is only relevant on the 2.5.x side - but if it doesn't change anything, the problem is elsewhere.

gert

comment:8 in reply to:  7 Changed 9 months ago by alexvelkov

Replying to Gert Döring:

Replying to alexvelkov:

  • If "fragment" is broken, then this could be communicated as a known-bug. Yes, using "tun-mtu" changes the size of the OpenVPN packets, although in my test I cannot configure it below the "don't fragment" value else the pings don't get replied.

It was not yet "known broken". It seems it is, in 2.5.0 and 2.5.1, but we didn't have time to investigate the "why" part yet.

Yes of course, no offense at all :).

Alex

comment:9 in reply to:  7 ; Changed 8 months ago by tct

Replying to Gert Döring:

Replying to alexvelkov:
It was not yet "known broken". It seems it is, in 2.5.0 and 2.5.1, but we didn't have time to investigate the "why" part yet.

Lots of code has been cleaned up and old code rewritten in the 3 years between 2.4.0 and 2.5.0, and it seems we overlooked something, and broke --fragment. There is a pretty nasty area of the code that calculates cipher overhead (so "if the inner packet is <x> byte, how big will the resulting UDP packet with all the overhead will be?"), and this is what plaisthos was hinting at. --ncp-disable is only relevant on the 2.5.x side - but if it doesn't change anything, the problem is elsewhere.

I have done a little more testing and I don't believe --fragment is as broken as I first thought.

It is difficult to use but once setup correctly it does seem to work.

I have a test rig which I tried:

fragment 1300
mssfix

Using a fully configured TLS Server and client git/master (more-or-less), downloading (http) over the VPN, tcpdump showed:

HTTP (Inside tunnel):

01:30:05.171537 IP 10.127.121.1.80 > 10.127.121.6.59996: Flags [.], seq 22576568:22577759, ack 164, win 508, options [nop,nop,TS val 2777937165 ecr 2503695730], length 1191: HTTP

USP (Outside tunnel):

15:52:12.302476 IP 10.10.101.101.34571 > 10.10.201.226.37448: UDP, length 1272

It may simply be the difficulty of using these options correctly which causes issues.

The manual could clarify that: --fragment is required on both Server and Client, if it is required at all.

Additionally, --fragment could explicitly set --mssfix correctly.

Version 2, edited 8 months ago by tct (previous) (next) (diff)

comment:10 in reply to:  9 Changed 8 months ago by Gert Döring

Replying to tincantech:

Additionally, --fragment could explicitly set --mssfix correctly.

--fragment and --mssfix are independent settings and serve different purpose.

If you have --mssfix set low-enough and do not have to care for large UDP packets, there is no need for --fragment anymore, as payload packets are already small enough so no fragmentation is needed.

comment:11 Changed 8 months ago by Gert Döring

Milestone: release 2.5release 2.5.3

comment:12 Changed 6 months ago by neoj

I am experiencing this issue too. For me I had to go back as far as 2.4.8 for it to behave as expected. 3bd91cd / d22ba6b the likely cause?

comment:13 Changed 5 months ago by tct

After reviewing this ticket it must be noted:

One side (Box1) is running an OpenVPN v2.3.2

V2.3 is End of Life as of June 2021:
https://community.openvpn.net/openvpn/wiki/SupportedVersions

comment:14 in reply to:  13 Changed 5 months ago by neoj

Replying to tincantech:

After reviewing this ticket it must be noted:

One side (Box1) is running an OpenVPN v2.3.2

V2.3 is End of Life as of June 2021:
https://community.openvpn.net/openvpn/wiki/SupportedVersions

In my setup I am in control of both ends of the tunnel and I have reproduced this behavior without 2.3. Within 2.4 I have to go back to 2.4.8 for fragmentation to work correctly. And within 2.5 fragmentation does not work in any version.

Note: See TracTickets for help on using tickets.