Opened 7 years ago
Last modified 7 years ago
#835 accepted Bug / Defect
OpenVPN exits on fatal error with large tun-mtu
Reported by: | Eric Crist | Owned by: | Steffan Karger |
---|---|---|---|
Priority: | minor | Milestone: | |
Component: | Networking | Version: | OpenVPN 2.4.0 (Community Ed) |
Severity: | Not set (select this one, unless your'e a OpenVPN developer) | Keywords: | |
Cc: |
Description
There was a discussion in #openvpn on Freenode with users T1w claiming to see speeds increase from ~503Mbps up to 958Mbps by modifying --tun-mtu and setting to some crazy numbers (49k, 48k, even 60k). Curious, I asked for his config files and what he used for testing.
T1w Testing Environment
- OpenVPN 2.3.14
- OpenSSL 1.0.1e-fips (RHEL package)
ecrist Testing Environment
- OpenVPN 2.4.0
- OpenSSL 1.0.1s
IRC Chat Logs
07:03:47 < BtbN> But it also creates quite a bit of lag on the network 07:05:13 <@ecrist> what/who does 60k MTUs as a "common" practice? 07:05:49 < T1w> ecrist: well.. at the moment I've gone up from ~503Mbit/s to 958Mbit/s just by going up with tun-mtu and disabling fragmentation and mss fix 07:06:57 < T1w> that's in a local test, so it's what I'd expect to see for a 1gbit link, but I've got a real life 1gbit line where I've never gone above ~120Mbit 07:07:15 < T1w> and I'd like to change that before investing in a black fiber to get the speed I need 07:07:50 <@ecrist> Is your current 1g link through a normal ISP? 07:07:52 < T1w> BtbN: lag is the least of my problems - as long as I get more throughput 07:07:53 < BtbN> saturating a gigabit link with openvpn is close to impossible though, even with all the tuning you can get 07:08:24 < T1w> ecrist: that probably pedends on what you mean by "normal" 07:08:30 < T1w> it's a datacenter 07:09:00 < T1w> BtbN: 958 is as close to saturated as I'd expect to see 07:09:06 <@ecrist> ah, most of those are grossly over-subscribed 07:09:30 <@ecrist> T1w: can you send me your config for client/server? I'd like to try that out 07:10:09 < T1w> BtbN: actually.. just trying the physical link between my testmachines (and not going through openvpn) I get a lower speed.. 07:10:30 < T1w> 942Mbit/s compared to the 958Mbit/s through openvpn 07:10:43 < T1w> that's.. interesting 07:10:46 < T1w> ecrist: sec.. 07:15:15 < T1w> ecrist: https://paste.sh/UgJDEDVG#u9Dnx5lXCZsBfSuHRY6QJV7a 07:15:39 < T1w> ecrist: it's where I've ended up after reading https://community.openvpn.net/openvpn/wiki/Gigabit_Networks_Linux 07:15:41 <@vpnHelper> Title: Gigabit_Networks_Linux � OpenVPN Community (at community.openvpn.net) 07:16:29 < T1w> ecrist: and no.. our link is not oversubscribed - I can easily get really close to wirespeed when transferring other stuff from around EU (I'm located in Denmark) 07:17:45 < T1w> funny thing is that once I went above 50000 bytes MTU my iperf tests dies out 07:17:52 < T1w> 49000 is fine, 50000 is not 07:19:15 < T1w> ping is fine, but iperf dies out after the first few MB 07:21:46 < T1w> hm.. seemlingly because the serverside dies 07:22:37 < T1w> ah 07:22:38 < T1w> Mon Jan 30 14:09:29 2017 us=427893 gauss/10.5.99.205:42307 Assertion failed at lzo.c:202 (buf_safe (&work, zlen)) 07:22:38 < T1w> Mon Jan 30 14:09:29 2017 us=427963 gauss/10.5.99.205:42307 Exiting due to fatal error 07:22:44 < T1w> lets try without lzo 07:22:48 <@ecrist> T1w: what is your iperf syntax for your performance measurements? 07:23:11 < T1w> ecrist: server: -s -p 10000 -B 10.8.8.6 07:23:26 < T1w> ecrist: client: -p 10000 -c 10.8.8.6 -t 30 07:24:15 < T1w> I squzed a bit more out by renicing the client iperf to -19 (since the system is used for other stuff) 07:24:49 < T1w> only a few mbit, but it removed a few fluctuations where performance dropped down 07:28:05 <@ecrist> which version of openvpn? 07:28:29 < T1w> hah.. no without lzo it just gives me 07:28:30 < T1w> 49781 IP packet with unknown IP version=15 seen 07:28:50 < T1w> ecrist: 2.3.14 07:29:23 < T1w> the version avabilable in the EPEL 7 x86_64 repo 07:29:43 < T1w> OpenVPN 2.3.14 x86_64-redhat-linux-gnu [SSL (OpenSSL)] [LZO] [EPOLL] [PKCS11] [MH] [IPv6] built on Dec 7 2016 07:29:45 < T1w> using 07:29:57 < T1w> library versions: OpenSSL 1.0.1e-fips 11 Feb 2013, LZO 2.06 07:30:15 < T1w> (RHEL version of OpenSSL with backports) 07:30:44 < T1w> afk - back in 10-15 mins 07:31:03 <@ecrist> I'm trying this out - so far, just two servers connected to the same switch each over 1g uplinks, I see 936Mbps 07:31:10 <@ecrist> now to set up OpenVPN 07:51:47 <@ecrist> huh, this is a new one 07:51:49 <@ecrist> Mon Jan 30 07:39:16 2017 us=113384 Assertion failed at crypto.c:81 (packet_id_initialized(&opt->packet_id)) 07:51:49 <@ecrist> Mon Jan 30 07:39:16 2017 us=113393 Exiting due to fatal error [@ecrist(+i)] [5:#openvpn(+cgnprt)] [257 nicks (@9 %0 +2 246)]
Results (read: problem)
The startup of the server side process is uneventful (logs attached). Startup of the client, however, results in a quick fatal error after full initialization. The same error is evident in the server logs, and that process also dies.
Server Log
Mon Jan 30 07:39:16 2017 us=32861 client/10.3.14.230 SENT CONTROL [client]: 'PUSH_REPLY,route 10.8.8.0 255.255.255.0,route 10.8.8.1,topology net30,ping 5,ping-restart 30,ifconfig 10.8.8.6 10.8.8.5,peer-id 0,cipher AES-256-GCM' (status=1) Mon Jan 30 07:39:16 2017 us=32907 client/10.3.14.230 Data Channel MTU parms [ L:48046 D:48046 EF:46 EB:8156 ET:0 EL:3 ] Mon Jan 30 07:39:16 2017 us=33059 client/10.3.14.230 Data Channel Encrypt: Cipher 'AES-256-GCM' initialized with 256 bit key Mon Jan 30 07:39:16 2017 us=33096 client/10.3.14.230 Data Channel Decrypt: Cipher 'AES-256-GCM' initialized with 256 bit key Mon Jan 30 07:39:21 2017 us=412962 client/10.3.14.230 Assertion failed at crypto.c:81 (packet_id_initialized(&opt->packet_id)) Mon Jan 30 07:39:21 2017 us=413099 client/10.3.14.230 Exiting due to fatal error Mon Jan 30 07:39:21 2017 us=413159 client/10.3.14.230 /sbin/route delete -net 10.8.8.0 10.8.8.2 255.255.255.0 delete net 10.8.8.0: gateway 10.8.8.2 Mon Jan 30 07:39:21 2017 us=415335 client/10.3.14.230 Closing TUN/TAP interface Mon Jan 30 07:39:21 2017 us=415525 client/10.3.14.230 /sbin/ifconfig tun1 destroy
Client Log
Mon Jan 30 07:39:16 2017 us=36473 /sbin/route add -net 10.8.8.0 10.8.8.5 255.255.255.0 add net 10.8.8.0: gateway 10.8.8.5 Mon Jan 30 07:39:16 2017 us=37358 /sbin/route add -net 10.8.8.1 10.8.8.5 255.255.255.255 add net 10.8.8.1: gateway 10.8.8.5 Mon Jan 30 07:39:16 2017 us=38263 Initialization Sequence Completed WrMon Jan 30 07:39:16 2017 us=113384 Assertion failed at crypto.c:81 (packet_id_initialized(&opt->packet_id)) Mon Jan 30 07:39:16 2017 us=113393 Exiting due to fatal error Mon Jan 30 07:39:16 2017 us=113422 /sbin/route delete -net 10.8.8.0 10.8.8.5 255.255.255.0 delete net 10.8.8.0: gateway 10.8.8.5 Mon Jan 30 07:39:16 2017 us=114344 /sbin/route delete -net 10.8.8.1 10.8.8.5 255.255.255.255 delete net 10.8.8.1: gateway 10.8.8.5 Mon Jan 30 07:39:16 2017 us=115199 Closing TUN/TAP interface Mon Jan 30 07:39:16 2017 us=115268 /sbin/ifconfig tun0 destroy
I've attached two zip files, one for each the server and client configuration, including DH parameters, certificates, keys, and the HMAC static key.
Attachments (4)
Change History (6)
Changed 7 years ago by
Attachment: | client.tgz added |
---|
comment:1 Changed 7 years ago by
Owner: | set to Steffan Karger |
---|---|
Status: | new → accepted |
Looks like --no-replay
(I really hoped nobody would be using this feature; it cripples the crypto) is interfering with AEAD cipher modes. That should still not cause openvpn to ASSERT() out though, so I'm accepting this ticket and will look into a fix - probably disabling NCP and printing big warnings when using --no-replay
.
comment:2 Changed 7 years ago by
Oh, and until the fix is in, just add --ncp-disable
as a workaround.
client configuration bundle