Opened 3 years ago

Last modified 3 years ago

#1384 accepted Bug / Defect

Connections can fail if ping-restart < connect-retry (UDP, static key)

Reported by: nils.toedtmann Owned by: Gert Döring
Priority: major Milestone: release 2.5.3
Component: Configuration Version: OpenVPN 2.4.7 (Community Ed)
Severity: Not set (select this one, unless your'e a OpenVPN developer) Keywords:
Cc: tct

Description (last modified by Selva Nair)

I have a case here with server and client both using keepalive 10 120 and the default connect-retry 5 300, where both sides fail to connect because they are in caught in a vicious circle of 7min loops:

  • Reconnecting Re-using pre-shared static key
  • Not receiving any pings for 120 sec
  • Declaring an Inactivity timeout and pausing for 300sec

and because one side's reconnect happens right when the other side pauses, they always miss each other. See the merged log snippet below.

What had happened was that because of DNS issues, both sides could not connect for several days, until connect-retry had grown from initially 5sec to 300sec. By the time the DNS issue got resolved, both where in 7min cycle (2min running, 5min pause), and by chance their 2min "running" phases didn't overlap.

This can happen at least in my case with UDP and a shared secret, see configs below. There it can occur if one side's connect-retry is larger than the other side's ping-restart, and vice versa.

I guess this problem is unique to the Static Key encryption mode, because in TLS mode the server side would not pause?

Note that the default maximum for connect-retry is 300, and that an often recommended setting for ping-restart is 120 (e.g. via keepalive 10 60).

If my analysis is correct and still the case, I'd consider it a bug in the documentation and the default values. I'd recommend to:

  • Document under which circumstances this can happen, e.g. in the man page
  • Reduce the default maximum for connect-retry from 300 to something smaller than the frequently found ping-restart 120 (at least in susceptible modes)
  • In susceptible modes throw a warning when ping-restart is set and not larger than the connect-retry maximum (not to be confused with connect-retry-max!)

(This is not entirely exact, as one would have to compare ping-restart on one side to the connect-retry maximum of the other side. But given that most users mirror those settings on both sides, maybe the best way forward).

We are using OpenVPN 2.4.7-1ubuntu2 on Ubuntu 20.04.


Snippets from the server's and client's log:

# SERVER:
Feb 09 22:11:35 server[605]: Inactivity timeout (--ping-restart), restarting
Feb 09 22:11:35 server[605]: SIGUSR1[soft,ping-restart] received, process restarting
Feb 09 22:11:35 server[605]: Restart pause, 300 second(s)
# SERVER PAUSES FOR 300sec

# CLIENT:
Feb 09 22:12:34 client[614041]: Re-using pre-shared static key
Feb 09 22:12:34 client[614041]: Preserving previous TUN/TAP instance: tun0
Feb 09 22:12:34 client[614041]: TCP/UDP: Preserving recently used remote address: [AF_INET]xx.yyy.xxx.yy:1194
Feb 09 22:12:34 client[614041]: Socket Buffers: R=[212992->212992] S=[212992->212992]
Feb 09 22:12:34 client[614041]: UDPv4 link local (bound): [AF_INET][undef]:1194
Feb 09 22:12:34 client[614041]: UDPv4 link remote: [AF_INET]xx.yyy.xxx.yy:1194
# Server is dead, so no pings fopr 120sec
Feb 09 22:14:34 client[614041]: Inactivity timeout (--ping-restart), restarting
Feb 09 22:14:34 client[614041]: SIGUSR1[soft,ping-restart] received, process restarting
Feb 09 22:14:34 client[614041]: Restart pause, 300 second(s)
# CLIENT PAUSES FOR 300sec

# SERVER WAKES UP:
Feb 09 22:16:35 server[605]: Re-using pre-shared static key
Feb 09 22:16:35 server[605]: Preserving previous TUN/TAP instance: tun0
Feb 09 22:16:35 server[605]: Socket Buffers: R=[212992->212992] S=[212992->212992]
Feb 09 22:16:35 server[605]: UDPv4 link local (bound): [AF_INET][undef]:1194
Feb 09 22:16:35 server[605]: UDPv4 link remote: [AF_UNSPEC]
# Client is pausing, so no pings for 120sec
Feb 09 22:18:35 server[605]: Inactivity timeout (--ping-restart), restarting
Feb 09 22:18:35 server[605]: SIGUSR1[soft,ping-restart] received, process restarting
Feb 09 22:18:35 server[605]: Restart pause, 300 second(s)
# SERVER PAUSES FOR 300sec

... and so on and so forth ad infinitum


Server config:

user                    openvpn
group                   openvpn
chroot                  /var/lib/openvpn
cd                      /var/lib/openvpn
tmp-dir                 state

verb                    3
status                  state/status.log  60

port                    1194
proto                   udp4

secret			/etc/openvpn/private/shared.key 0
persist-key

dev                     tun
persist-tun
ifconfig                172.29.0.1 172.29.0.2
keepalive               10 120
compress
ncp-disable
cipher                  AES-256-CBC
auth                    SHA256
replay-persist          state/rpstate

route                   x.x.x.x y.y.y.y

Client config:

user                    openvpn
group                   openvpn
chroot                  /var/lib/openvpn
tmp-dir                 state
verb                    3

remote                  server 1194 udp4

secret                  /etc/openvpn/private/shared.key 1
persist-key

dev                     tun
persist-tun
ifconfig                172.29.0.2 172.29.0.1
keepalive               10 120
compress
ncp-disable
cipher                  AES-256-CBC
auth                    SHA256

route                   x.x.x.x y.y.y.y

Change History (9)

comment:1 Changed 3 years ago by nils.toedtmann

Another observation is that even though the documentation for keepalive states "The timeout argument will be twice as long on the server side", in my case it is clear from the server logs, that it does not double the timeout in keepalive 10 120. Instead, it uses straight ping-restart 120.

Again, I suspect that this is due to static key mode?

comment:2 Changed 3 years ago by nils.toedtmann

Aherm ... could someone with sufficient access please remove the IP address from the log? I stupidly left it in 2 places, and I can't seem to edit it myself. Apologies

comment:3 Changed 3 years ago by Selva Nair

Description: modified (diff)

Sounds somewhat similar to #1010 which we somehow missed to address. The fix proposed there (i.e., make the backoff conditional on having a remote) may take care of this one too?

As for --keepalive x y, its a shorthand for --ping x --ping-restart y except for OpenVPN client-server setups where its supposed to be used on the server-side. The server pushes it to the client and doubles y on itself. Doesn't apply to your setup.

comment:4 Changed 3 years ago by nils.toedtmann

It seems this is indeed a regression or re-occurrance of #1010, except this time in static key mode.

And yes, disabling the backoff when there is no remote should do the trick.

BTW May I ask for clarifications? Apologies for being somewhat unfamiliar with the lingo

  • Is "No remote statement in config" and "server-side" the same thing? If not, can you provide an example?
  • You write [ping-restart doubling & pushing] "Doesn't apply to your setup" - why? Is it because it's in static key mode?

PS: Thanks for cleansing the log

comment:5 in reply to:  4 Changed 3 years ago by Selva Nair

Replying to nils.toedtmann:

BTW May I ask for clarifications? Apologies for being somewhat unfamiliar with the lingo

  • Is "No remote statement in config" and "server-side" the same thing? If not, can you provide an example?
  • You write [ping-restart doubling & pushing] "Doesn't apply to your setup" - why? Is it because it's in static key mode?

Your is a p2p setup, there is no side running with --mode server. The listening side of the connection (with or without using TLS) may have no --remote but is not an "OpenVPN server".

Here the word "server" is used in a very specific sense relevant to OpenVPN's mode of operation where multiple instances connect to one: requires TLS, one instance run with --mode server, and others with --client. Only in this case directives like ping-restart are pushed from "server" to "client".

Not to be confused with the TLS server (server side of a TLS handshake) or the listening side of a peer-to-peer connection.

comment:6 Changed 3 years ago by tct

Cc: tct added

comment:7 Changed 3 years ago by Gert Döring

Indeed, this sounds like a variant of the bug in #1010. Time to get it fixed.

comment:8 Changed 3 years ago by Gert Döring

Milestone: release 2.4.11

comment:9 Changed 3 years ago by Gert Döring

Milestone: release 2.4.11release 2.5.3
Owner: set to Gert Döring
Status: newaccepted

commit 063d55afeea723fc6df0af29a19df257a8ab6920 (master)
commit d8dee82f1129ac6d3e4bcdc867726f5d64798dc7 (release/2.5)
commit 7029cece844d9324aff687981b8b6c33b099db2d (release/2.4)
Author: Selva Nair
Date: Wed Jun 2 15:47:39 2021 -0400

Apply the connect-retry backoff to only one side of a connection

we might not release another 2.4.x release - but if we do, it will be in 2.4.11.

Setting the milestone to "release 2.5.3" to document "it is in 2.5.3, to be released today" :-)

It would be good to have feedback on whether it indeed fixes the problems for you (in my tests, I could reproduce the problematic behaviour on a "p2p server" instance, and the patch removed backoff in that case).

Version 0, edited 3 years ago by Gert Döring (next)
Note: See TracTickets for help on using tickets.