Opened 3 years ago

Last modified 3 years ago

#1384 accepted Bug / Defect

Connections can fail if ping-restart < connect-retry (UDP, static key) — at Version 3

Reported by: nils.toedtmann Owned by:
Priority: major Milestone: release 2.5.3
Component: Configuration Version: OpenVPN 2.4.7 (Community Ed)
Severity: Not set (select this one, unless your'e a OpenVPN developer) Keywords:
Cc: tct

Description (last modified by Selva Nair)

I have a case here with server and client both using keepalive 10 120 and the default connect-retry 5 300, where both sides fail to connect because they are in caught in a vicious circle of 7min loops:

  • Reconnecting Re-using pre-shared static key
  • Not receiving any pings for 120 sec
  • Declaring an Inactivity timeout and pausing for 300sec

and because one side's reconnect happens right when the other side pauses, they always miss each other. See the merged log snippet below.

What had happened was that because of DNS issues, both sides could not connect for several days, until connect-retry had grown from initially 5sec to 300sec. By the time the DNS issue got resolved, both where in 7min cycle (2min running, 5min pause), and by chance their 2min "running" phases didn't overlap.

This can happen at least in my case with UDP and a shared secret, see configs below. There it can occur if one side's connect-retry is larger than the other side's ping-restart, and vice versa.

I guess this problem is unique to the Static Key encryption mode, because in TLS mode the server side would not pause?

Note that the default maximum for connect-retry is 300, and that an often recommended setting for ping-restart is 120 (e.g. via keepalive 10 60).

If my analysis is correct and still the case, I'd consider it a bug in the documentation and the default values. I'd recommend to:

  • Document under which circumstances this can happen, e.g. in the man page
  • Reduce the default maximum for connect-retry from 300 to something smaller than the frequently found ping-restart 120 (at least in susceptible modes)
  • In susceptible modes throw a warning when ping-restart is set and not larger than the connect-retry maximum (not to be confused with connect-retry-max!)

(This is not entirely exact, as one would have to compare ping-restart on one side to the connect-retry maximum of the other side. But given that most users mirror those settings on both sides, maybe the best way forward).

We are using OpenVPN 2.4.7-1ubuntu2 on Ubuntu 20.04.


Snippets from the server's and client's log:

# SERVER:
Feb 09 22:11:35 server[605]: Inactivity timeout (--ping-restart), restarting
Feb 09 22:11:35 server[605]: SIGUSR1[soft,ping-restart] received, process restarting
Feb 09 22:11:35 server[605]: Restart pause, 300 second(s)
# SERVER PAUSES FOR 300sec

# CLIENT:
Feb 09 22:12:34 client[614041]: Re-using pre-shared static key
Feb 09 22:12:34 client[614041]: Preserving previous TUN/TAP instance: tun0
Feb 09 22:12:34 client[614041]: TCP/UDP: Preserving recently used remote address: [AF_INET]xx.yyy.xxx.yy:1194
Feb 09 22:12:34 client[614041]: Socket Buffers: R=[212992->212992] S=[212992->212992]
Feb 09 22:12:34 client[614041]: UDPv4 link local (bound): [AF_INET][undef]:1194
Feb 09 22:12:34 client[614041]: UDPv4 link remote: [AF_INET]xx.yyy.xxx.yy:1194
# Server is dead, so no pings fopr 120sec
Feb 09 22:14:34 client[614041]: Inactivity timeout (--ping-restart), restarting
Feb 09 22:14:34 client[614041]: SIGUSR1[soft,ping-restart] received, process restarting
Feb 09 22:14:34 client[614041]: Restart pause, 300 second(s)
# CLIENT PAUSES FOR 300sec

# SERVER WAKES UP:
Feb 09 22:16:35 server[605]: Re-using pre-shared static key
Feb 09 22:16:35 server[605]: Preserving previous TUN/TAP instance: tun0
Feb 09 22:16:35 server[605]: Socket Buffers: R=[212992->212992] S=[212992->212992]
Feb 09 22:16:35 server[605]: UDPv4 link local (bound): [AF_INET][undef]:1194
Feb 09 22:16:35 server[605]: UDPv4 link remote: [AF_UNSPEC]
# Client is pausing, so no pings for 120sec
Feb 09 22:18:35 server[605]: Inactivity timeout (--ping-restart), restarting
Feb 09 22:18:35 server[605]: SIGUSR1[soft,ping-restart] received, process restarting
Feb 09 22:18:35 server[605]: Restart pause, 300 second(s)
# SERVER PAUSES FOR 300sec

... and so on and so forth ad infinitum


Server config:

user                    openvpn
group                   openvpn
chroot                  /var/lib/openvpn
cd                      /var/lib/openvpn
tmp-dir                 state

verb                    3
status                  state/status.log  60

port                    1194
proto                   udp4

secret			/etc/openvpn/private/shared.key 0
persist-key

dev                     tun
persist-tun
ifconfig                172.29.0.1 172.29.0.2
keepalive               10 120
compress
ncp-disable
cipher                  AES-256-CBC
auth                    SHA256
replay-persist          state/rpstate

route                   x.x.x.x y.y.y.y

Client config:

user                    openvpn
group                   openvpn
chroot                  /var/lib/openvpn
tmp-dir                 state
verb                    3

remote                  server 1194 udp4

secret                  /etc/openvpn/private/shared.key 1
persist-key

dev                     tun
persist-tun
ifconfig                172.29.0.2 172.29.0.1
keepalive               10 120
compress
ncp-disable
cipher                  AES-256-CBC
auth                    SHA256

route                   x.x.x.x y.y.y.y

Change History (3)

comment:1 Changed 3 years ago by nils.toedtmann

Another observation is that even though the documentation for keepalive states "The timeout argument will be twice as long on the server side", in my case it is clear from the server logs, that it does not double the timeout in keepalive 10 120. Instead, it uses straight ping-restart 120.

Again, I suspect that this is due to static key mode?

comment:2 Changed 3 years ago by nils.toedtmann

Aherm ... could someone with sufficient access please remove the IP address from the log? I stupidly left it in 2 places, and I can't seem to edit it myself. Apologies

comment:3 Changed 3 years ago by Selva Nair

Description: modified (diff)

Sounds somewhat similar to #1010 which we somehow missed to address. The fix proposed there (i.e., make the backoff conditional on having a remote) may take care of this one too?

As for --keepalive x y, its a shorthand for --ping x --ping-restart y except for OpenVPN client-server setups where its supposed to be used on the server-side. The server pushes it to the client and doubles y on itself. Doesn't apply to your setup.

Note: See TracTickets for help on using tickets.