Opened 16 months ago

Last modified 16 months ago

#835 accepted Bug / Defect

OpenVPN exits on fatal error with large tun-mtu

Reported by: Eric Crist Owned by: Steffan Karger
Priority: minor Milestone:
Component: Networking Version: OpenVPN 2.4.0 (Community Ed)
Severity: Not set (select this one, unless your'e a OpenVPN developer) Keywords:
Cc:

Description

There was a discussion in #openvpn on Freenode with users T1w claiming to see speeds increase from ~503Mbps up to 958Mbps by modifying --tun-mtu and setting to some crazy numbers (49k, 48k, even 60k). Curious, I asked for his config files and what he used for testing.

T1w Testing Environment

  • OpenVPN 2.3.14
  • OpenSSL 1.0.1e-fips (RHEL package)

ecrist Testing Environment

  • OpenVPN 2.4.0
  • OpenSSL 1.0.1s

IRC Chat Logs

07:03:47 < BtbN> But it also creates quite a bit of lag on the network
07:05:13 <@ecrist> what/who does 60k MTUs as a "common" practice?
07:05:49 < T1w> ecrist: well.. at the moment I've gone up from ~503Mbit/s to 958Mbit/s just by going up with 
                tun-mtu and disabling fragmentation and mss fix
07:06:57 < T1w> that's in a local test, so it's what I'd expect to see for a 1gbit link, but I've got a real life 
                1gbit line where I've never gone above ~120Mbit
07:07:15 < T1w> and I'd like to change that before investing in a black fiber to get the speed I need
07:07:50 <@ecrist> Is your current 1g link through a normal ISP?
07:07:52 < T1w> BtbN: lag is the least of my problems - as long as I get more throughput
07:07:53 < BtbN> saturating a gigabit link with openvpn is close to impossible though, even with all the tuning you 
                 can get
07:08:24 < T1w> ecrist: that probably pedends on what you mean by "normal"
07:08:30 < T1w> it's a datacenter
07:09:00 < T1w> BtbN: 958 is as close to saturated as I'd expect to see
07:09:06 <@ecrist> ah, most of those are grossly over-subscribed
07:09:30 <@ecrist> T1w: can you send me your config for client/server?  I'd like to try that out
07:10:09 < T1w> BtbN: actually.. just trying the physical link between my testmachines (and not going through 
                openvpn) I get a lower speed..
07:10:30 < T1w> 942Mbit/s compared to the 958Mbit/s through openvpn
07:10:43 < T1w> that's.. interesting
07:10:46 < T1w> ecrist: sec..
07:15:15 < T1w> ecrist: https://paste.sh/UgJDEDVG#u9Dnx5lXCZsBfSuHRY6QJV7a
07:15:39 < T1w> ecrist: it's where I've ended up after reading 
                https://community.openvpn.net/openvpn/wiki/Gigabit_Networks_Linux
07:15:41 <@vpnHelper> Title: Gigabit_Networks_Linux � OpenVPN Community (at community.openvpn.net)
07:16:29 < T1w> ecrist: and no.. our link is not oversubscribed - I can easily get really close to wirespeed when 
                transferring other stuff from around EU (I'm located in Denmark)
07:17:45 < T1w> funny thing is that once I went above 50000 bytes MTU my iperf tests dies out
07:17:52 < T1w> 49000 is fine, 50000 is not
07:19:15 < T1w> ping is fine, but iperf dies out after the first few MB
07:21:46 < T1w> hm.. seemlingly because the serverside dies
07:22:37 < T1w> ah
07:22:38 < T1w> Mon Jan 30 14:09:29 2017 us=427893 gauss/10.5.99.205:42307 Assertion failed at lzo.c:202 (buf_safe 
                (&work, zlen))
07:22:38 < T1w> Mon Jan 30 14:09:29 2017 us=427963 gauss/10.5.99.205:42307 Exiting due to fatal error
07:22:44 < T1w> lets try without lzo
07:22:48 <@ecrist> T1w: what is your iperf syntax for your performance measurements?
07:23:11 < T1w> ecrist: server:  -s -p 10000 -B 10.8.8.6
07:23:26 < T1w> ecrist: client:  -p 10000 -c 10.8.8.6 -t 30
07:24:15 < T1w> I squzed a bit more out by renicing the client iperf to -19 (since the system is used for other 
                stuff)
07:24:49 < T1w> only a few mbit, but it removed a few fluctuations where performance dropped down
07:28:05 <@ecrist> which version of openvpn?
07:28:29 < T1w> hah.. no without lzo it just gives me
07:28:30 < T1w> 49781 IP packet with unknown IP version=15 seen
07:28:50 < T1w> ecrist: 2.3.14
07:29:23 < T1w> the version avabilable in the EPEL 7 x86_64 repo
07:29:43 < T1w> OpenVPN 2.3.14 x86_64-redhat-linux-gnu [SSL (OpenSSL)] [LZO] [EPOLL] [PKCS11] [MH] [IPv6] built on 
                Dec  7 2016
07:29:45 < T1w> using
07:29:57 < T1w> library versions: OpenSSL 1.0.1e-fips 11 Feb 2013, LZO 2.06
07:30:15 < T1w> (RHEL version of OpenSSL with backports)
07:30:44 < T1w> afk - back in 10-15 mins
07:31:03 <@ecrist> I'm trying this out - so far, just two servers connected to the same switch each over 1g 
                   uplinks, I see 936Mbps
07:31:10 <@ecrist> now to set up OpenVPN
07:51:47 <@ecrist> huh, this is a new one
07:51:49 <@ecrist> Mon Jan 30 07:39:16 2017 us=113384 Assertion failed at crypto.c:81 
                   (packet_id_initialized(&opt->packet_id))
07:51:49 <@ecrist> Mon Jan 30 07:39:16 2017 us=113393 Exiting due to fatal error
 [@ecrist(+i)] [5:#openvpn(+cgnprt)] [257 nicks (@9 %0 +2 246)]                                                     

Results (read: problem)

The startup of the server side process is uneventful (logs attached). Startup of the client, however, results in a quick fatal error after full initialization. The same error is evident in the server logs, and that process also dies.

Server Log

Mon Jan 30 07:39:16 2017 us=32861 client/10.3.14.230 SENT CONTROL [client]: 'PUSH_REPLY,route 10.8.8.0 255.255.255.0,route 10.8.8.1,topology net30,ping 5,ping-restart 30,ifconfig 10.8.8.6 10.8.8.5,peer-id 0,cipher AES-256-GCM' (status=1)
Mon Jan 30 07:39:16 2017 us=32907 client/10.3.14.230 Data Channel MTU parms [ L:48046 D:48046 EF:46 EB:8156 ET:0 EL:3 ]
Mon Jan 30 07:39:16 2017 us=33059 client/10.3.14.230 Data Channel Encrypt: Cipher 'AES-256-GCM' initialized with 256 bit key
Mon Jan 30 07:39:16 2017 us=33096 client/10.3.14.230 Data Channel Decrypt: Cipher 'AES-256-GCM' initialized with 256 bit key
Mon Jan 30 07:39:21 2017 us=412962 client/10.3.14.230 Assertion failed at crypto.c:81 (packet_id_initialized(&opt->packet_id))
Mon Jan 30 07:39:21 2017 us=413099 client/10.3.14.230 Exiting due to fatal error
Mon Jan 30 07:39:21 2017 us=413159 client/10.3.14.230 /sbin/route delete -net 10.8.8.0 10.8.8.2 255.255.255.0
delete net 10.8.8.0: gateway 10.8.8.2
Mon Jan 30 07:39:21 2017 us=415335 client/10.3.14.230 Closing TUN/TAP interface
Mon Jan 30 07:39:21 2017 us=415525 client/10.3.14.230 /sbin/ifconfig tun1 destroy

Client Log

Mon Jan 30 07:39:16 2017 us=36473 /sbin/route add -net 10.8.8.0 10.8.8.5 255.255.255.0
add net 10.8.8.0: gateway 10.8.8.5
Mon Jan 30 07:39:16 2017 us=37358 /sbin/route add -net 10.8.8.1 10.8.8.5 255.255.255.255
add net 10.8.8.1: gateway 10.8.8.5
Mon Jan 30 07:39:16 2017 us=38263 Initialization Sequence Completed
WrMon Jan 30 07:39:16 2017 us=113384 Assertion failed at crypto.c:81 (packet_id_initialized(&opt->packet_id))
Mon Jan 30 07:39:16 2017 us=113393 Exiting due to fatal error
Mon Jan 30 07:39:16 2017 us=113422 /sbin/route delete -net 10.8.8.0 10.8.8.5 255.255.255.0
delete net 10.8.8.0: gateway 10.8.8.5
Mon Jan 30 07:39:16 2017 us=114344 /sbin/route delete -net 10.8.8.1 10.8.8.5 255.255.255.255
delete net 10.8.8.1: gateway 10.8.8.5
Mon Jan 30 07:39:16 2017 us=115199 Closing TUN/TAP interface
Mon Jan 30 07:39:16 2017 us=115268 /sbin/ifconfig tun0 destroy

I've attached two zip files, one for each the server and client configuration, including DH parameters, certificates, keys, and the HMAC static key.

Attachments (4)

client.tgz (15.0 KB) - added by Eric Crist 16 months ago.
client configuration bundle
server.tgz (5.0 KB) - added by Eric Crist 16 months ago.
server configuration bundle
client.log (35.1 KB) - added by Eric Crist 16 months ago.
client log file
server.log (73.7 KB) - added by Eric Crist 16 months ago.
server log file

Download all attachments as: .zip

Change History (6)

Changed 16 months ago by Eric Crist

Attachment: client.tgz added

client configuration bundle

Changed 16 months ago by Eric Crist

Attachment: server.tgz added

server configuration bundle

Changed 16 months ago by Eric Crist

Attachment: client.log added

client log file

Changed 16 months ago by Eric Crist

Attachment: server.log added

server log file

comment:1 Changed 16 months ago by Steffan Karger

Owner: set to Steffan Karger
Status: newaccepted

Looks like --no-replay (I really hoped nobody would be using this feature; it cripples the crypto) is interfering with AEAD cipher modes. That should still not cause openvpn to ASSERT() out though, so I'm accepting this ticket and will look into a fix - probably disabling NCP and printing big warnings when using --no-replay.

comment:2 Changed 16 months ago by Steffan Karger

Oh, and until the fix is in, just add --ncp-disable as a workaround.

Sorry, although that does make it work again, in this case the correct fix is fixing replay, instead of disabling it: increase the replay window using --replay-window n, where n is some value larger than the default of 64.

Last edited 16 months ago by Steffan Karger (previous) (diff)
Note: See TracTickets for help on using tickets.