Opened 4 years ago
Last modified 3 years ago
#1384 accepted Bug / Defect
Connections can fail if ping-restart < connect-retry (UDP, static key)
Reported by: | nils.toedtmann | Owned by: | Gert Döring |
---|---|---|---|
Priority: | major | Milestone: | release 2.5.3 |
Component: | Configuration | Version: | OpenVPN 2.4.7 (Community Ed) |
Severity: | Not set (select this one, unless your'e a OpenVPN developer) | Keywords: | |
Cc: | tct |
Description (last modified by )
I have a case here with server and client both using keepalive 10 120
and the default connect-retry 5 300
, where both sides fail to connect because they are in caught in a vicious circle of 7min loops:
- Reconnecting
Re-using pre-shared static key
- Not receiving any pings for 120 sec
- Declaring an
Inactivity timeout
and pausing for 300sec
and because one side's reconnect happens right when the other side pauses, they always miss each other. See the merged log snippet below.
What had happened was that because of DNS issues, both sides could not connect for several days, until connect-retry
had grown from initially 5sec to 300sec. By the time the DNS issue got resolved, both where in 7min cycle (2min running, 5min pause), and by chance their 2min "running" phases didn't overlap.
This can happen at least in my case with UDP and a shared secret, see configs below. There it can occur if one side's connect-retry
is larger than the other side's ping-restart
, and vice versa.
I guess this problem is unique to the Static Key encryption mode, because in TLS mode the server side would not pause?
Note that the default maximum for connect-retry
is 300, and that an often recommended setting for ping-restart
is 120 (e.g. via keepalive 10 60
).
If my analysis is correct and still the case, I'd consider it a bug in the documentation and the default values. I'd recommend to:
- Document under which circumstances this can happen, e.g. in the man page
- Reduce the default maximum for
connect-retry
from 300 to something smaller than the frequently foundping-restart 120
(at least in susceptible modes) - In susceptible modes throw a warning when
ping-restart
is set and not larger than theconnect-retry
maximum (not to be confused withconnect-retry-max
!)
(This is not entirely exact, as one would have to compare ping-restart
on one side to the connect-retry
maximum of the other side. But given that most users mirror those settings on both sides, maybe the best way forward).
We are using OpenVPN 2.4.7-1ubuntu2 on Ubuntu 20.04.
Snippets from the server's and client's log:
# SERVER: Feb 09 22:11:35 server[605]: Inactivity timeout (--ping-restart), restarting Feb 09 22:11:35 server[605]: SIGUSR1[soft,ping-restart] received, process restarting Feb 09 22:11:35 server[605]: Restart pause, 300 second(s) # SERVER PAUSES FOR 300sec # CLIENT: Feb 09 22:12:34 client[614041]: Re-using pre-shared static key Feb 09 22:12:34 client[614041]: Preserving previous TUN/TAP instance: tun0 Feb 09 22:12:34 client[614041]: TCP/UDP: Preserving recently used remote address: [AF_INET]xx.yyy.xxx.yy:1194 Feb 09 22:12:34 client[614041]: Socket Buffers: R=[212992->212992] S=[212992->212992] Feb 09 22:12:34 client[614041]: UDPv4 link local (bound): [AF_INET][undef]:1194 Feb 09 22:12:34 client[614041]: UDPv4 link remote: [AF_INET]xx.yyy.xxx.yy:1194 # Server is dead, so no pings fopr 120sec Feb 09 22:14:34 client[614041]: Inactivity timeout (--ping-restart), restarting Feb 09 22:14:34 client[614041]: SIGUSR1[soft,ping-restart] received, process restarting Feb 09 22:14:34 client[614041]: Restart pause, 300 second(s) # CLIENT PAUSES FOR 300sec # SERVER WAKES UP: Feb 09 22:16:35 server[605]: Re-using pre-shared static key Feb 09 22:16:35 server[605]: Preserving previous TUN/TAP instance: tun0 Feb 09 22:16:35 server[605]: Socket Buffers: R=[212992->212992] S=[212992->212992] Feb 09 22:16:35 server[605]: UDPv4 link local (bound): [AF_INET][undef]:1194 Feb 09 22:16:35 server[605]: UDPv4 link remote: [AF_UNSPEC] # Client is pausing, so no pings for 120sec Feb 09 22:18:35 server[605]: Inactivity timeout (--ping-restart), restarting Feb 09 22:18:35 server[605]: SIGUSR1[soft,ping-restart] received, process restarting Feb 09 22:18:35 server[605]: Restart pause, 300 second(s) # SERVER PAUSES FOR 300sec
... and so on and so forth ad infinitum
Server config:
user openvpn group openvpn chroot /var/lib/openvpn cd /var/lib/openvpn tmp-dir state verb 3 status state/status.log 60 port 1194 proto udp4 secret /etc/openvpn/private/shared.key 0 persist-key dev tun persist-tun ifconfig 172.29.0.1 172.29.0.2 keepalive 10 120 compress ncp-disable cipher AES-256-CBC auth SHA256 replay-persist state/rpstate route x.x.x.x y.y.y.y
Client config:
user openvpn group openvpn chroot /var/lib/openvpn tmp-dir state verb 3 remote server 1194 udp4 secret /etc/openvpn/private/shared.key 1 persist-key dev tun persist-tun ifconfig 172.29.0.2 172.29.0.1 keepalive 10 120 compress ncp-disable cipher AES-256-CBC auth SHA256 route x.x.x.x y.y.y.y
Change History (9)
comment:1 Changed 4 years ago by
comment:2 Changed 4 years ago by
Aherm ... could someone with sufficient access please remove the IP address from the log? I stupidly left it in 2 places, and I can't seem to edit it myself. Apologies
comment:3 Changed 4 years ago by
Description: | modified (diff) |
---|
Sounds somewhat similar to #1010 which we somehow missed to address. The fix proposed there (i.e., make the backoff conditional on having a remote) may take care of this one too?
As for --keepalive x y
, its a shorthand for --ping x --ping-restart y
except for OpenVPN client-server setups where its supposed to be used on the server-side. The server pushes it to the client and doubles y on itself. Doesn't apply to your setup.
comment:4 follow-up: 5 Changed 4 years ago by
It seems this is indeed a regression or re-occurrance of #1010, except this time in static key mode.
And yes, disabling the backoff when there is no remote should do the trick.
BTW May I ask for clarifications? Apologies for being somewhat unfamiliar with the lingo
- Is "No
remote
statement in config" and "server-side" the same thing? If not, can you provide an example? - You write [ping-restart doubling & pushing] "Doesn't apply to your setup" - why? Is it because it's in static key mode?
PS: Thanks for cleansing the log
comment:5 Changed 4 years ago by
Replying to nils.toedtmann:
BTW May I ask for clarifications? Apologies for being somewhat unfamiliar with the lingo
- Is "No
remote
statement in config" and "server-side" the same thing? If not, can you provide an example?
- You write [ping-restart doubling & pushing] "Doesn't apply to your setup" - why? Is it because it's in static key mode?
Your is a p2p setup, there is no side running with --mode server
. The listening side of the connection (with or without using TLS) may have no --remote
but is not an "OpenVPN server".
Here the word "server" is used in a very specific sense relevant to OpenVPN's mode of operation where multiple instances connect to one: requires TLS, one instance run with --mode server
, and others with --client
. Only in this case directives like ping-restart are pushed from "server" to "client".
Not to be confused with the TLS server (server side of a TLS handshake) or the listening side of a peer-to-peer connection.
comment:6 Changed 4 years ago by
Cc: | tct added |
---|
comment:7 Changed 4 years ago by
Indeed, this sounds like a variant of the bug in #1010. Time to get it fixed.
comment:8 Changed 4 years ago by
Milestone: | → release 2.4.11 |
---|
comment:9 Changed 3 years ago by
Milestone: | release 2.4.11 → release 2.5.3 |
---|---|
Owner: | set to Gert Döring |
Status: | new → accepted |
commit 063d55afeea723fc6df0af29a19df257a8ab6920 (master)
commit d8dee82f1129ac6d3e4bcdc867726f5d64798dc7 (release/2.5)
commit 7029cece844d9324aff687981b8b6c33b099db2d (release/2.4)
Author: Selva Nair
Date: Wed Jun 2 15:47:39 2021 -0400
Apply the connect-retry backoff to only one side of a connection
we might not release another 2.4.x release - but if we do, it will be in 2.4.12.
Setting the milestone to "release 2.5.3" to document "it is in 2.5.3, to be released today" :-)
It would be good to have feedback on whether it indeed fixes the problems for you (in my tests, I could reproduce the problematic behaviour on a "p2p server" instance, and the patch removed backoff in that case).
Another observation is that even though the documentation for
keepalive
states "The timeout argument will be twice as long on the server side", in my case it is clear from the server logs, that it does not double the timeout inkeepalive 10 120
. Instead, it uses straightping-restart 120
.Again, I suspect that this is due to static key mode?