Opened 2 years ago

Closed 2 years ago

#1021 closed Bug / Defect (notabug)

Collision of parallel Key Renegotiations (based on reneg-sec)

Reported by: sumpfralle Owned by: Steffan Karger
Priority: major Milestone: release 2.3.4
Component: Crypto Version: OpenVPN 2.3.4 (Community Ed)
Severity: Not set (select this one, unless your'e a OpenVPN developer) Keywords: reneg-sec
Cc:

Description

We (a local wireless community) are running multiple OpenVPN servers with approximately 50 Clients each.

We noticed in our logs, that occassionally a lot of the clients repeatedly triggered a reconnection after a period of exactly one hour.

Specifically the following situation would occour (with times in %H:%M):

  • 08:22 - the openvpn process on the server is restarted
  • 09:22 - 45 clients reconnected (for no obvious reason)
  • 10:22 - 42 clients reconnected
  • 11:22 - 40 clients reconnected
  • ...

I suspected, that this period could be based on the Key Renegotiation period (reneg-sec is 3600 by default). Thus I reduced this parameter to 180s on the server.
After the openvpn server restart I found the above reconnect events of most clients exactly three minutes after the process restart (and another three minutes later and so on).

Thus the reconnect events are most likely tied to the Key Renegotiation period.

Our OpenVPN clients use a slightly specific configuration (reduced to the TLS and timing-related options):

tls-exit
single-session
hand-window 30

I could imagine, that the reduced hand-window timeout and the limitiation to try only a single TLS connect attempt (tls-exit) before exiting, could make the situation for us worse, than for other setups.

For now we will try to break up the periodic parallel Key Renegotiation of all clients at the same time (e.g. after each service restart) by specifying a reneg-pkts limit. This could help to spread the timing of the Key Renegotiation for all clients evenly within a shorter amount of time.

Additionally it would probably be a helpful to increase the hand-window limit on our clients? (even though these updates will take some months/years for us)

I am a bit confused, that the clients are giving up so quickly, instead of delaying the Key Renegotiation a bit (but probably this thought is too naive).

I could imagine that the unrelatedly proposed patch from ticket #865 (adding a random delay to the renegotiation) would be very helpful for our case.

The issue is visible with the following versions (all on Debian amd64):

  • v2.2.1
  • v2.3.4

It is not clear, if the issue also exists in 2.4.x, since that version dynamically reduces the "reneg-bytes" due to older clients being susceptible to SWEET32. Thus I am not certain, if the problem persists up to now.

Change History (6)

comment:1 Changed 2 years ago by selvanair

The patch proposed in #865 has been merged to 2.5 git master. Used as --reneg-sec max [min] to get a random reneg interval between min and max (min defaults to 90% of max). IIRC, only the server needs to use 2.5 for this to be effective.

Although upgrading to 2.4 will not help with here[*], 2.2.1 and even 2.3.4 are very old and moving to 2.4.x is highly recommended.

[*] Except when reneg-bytes gets automatically reduced when a deprecated low block size cipher is in use (BF-CBC) as you noted -- but that would be a wrong "fix".

comment:2 Changed 2 years ago by sumpfralle

Thank you for your quick response!

The patch proposed in #865 has been merged to 2.5 git master.

This is great!

Although upgrading to 2.4 will not help with here[*], 2.2.1 and even 2.3.4 are very old and moving to 2.4.x is highly recommended.

Yes - this is not a problem. These are servers with Debian Wheezy (oldoldstable, close to EOL) and Debian Jessie (oldstable) - they will be upgraded in time.

Regarding the problematic collisions of parallel key renegotiations that we experienced: is this the expected behavior (including periodic re-connections of clients)?

Maybe the random renegotiation delay of #865 should be the default behavior in order to deal with waves of clients connecting at the same time (i.e. infrastructure clients - not roadwarriors)?

Or is our setup just way too exotic?

comment:3 Changed 2 years ago by selvanair

Owner: changed from Steffan Karger to selvanair
Status: newassigned

Regarding the problematic collisions of parallel key renegotiations that we >experienced: is this the expected behavior (including periodic re-connections of
clients)?

Not sure what you mean by collisions. Periodic reneg is a security feature and its better to keep a reasonable reneg interval like an hour or two instead of switching it off. But that should not cause "reconnections", only TLS renegotiation and re-authentication -- the tunnel is not torn down and the data flow through the tunnel is preserved during this process. You will see "TLS soft-reset" in the logs. This makes it possible to have a long running tunnel with the security of short-lived keys.

If clients are failing the reneg and then ping timeout and reconnect, that would be a more serious concern and should be resolved.

The interval for each client starts from the connection time. So, in general, not all clients would reneg at the same time. But in setups where there are a large number of long running clients, if/when the server has to be restarted, all clients will reconnect at the same time and cause all future renegs to happen almost at the same time. I think that's what you observed.

Anyway, the feature to add jitter (#865) is on by default in 2.5. So this should be less of a concern going forward.

comment:4 Changed 2 years ago by selvanair

Owner: changed from selvanair to Steffan Karger

Not sure how the owner changed -- may be I pressed some button by mistake. Trying to revert.

comment:5 Changed 2 years ago by sumpfralle

Thank you for taking the time to respond!

Indeed I mixed up the harmless warning messages related to the key renegotiation with a problematic reconnection event.

The current behavior (periodic key renegotiations happening at the same time for all/many clients) thus causes just a load peak and not a real connection problem.

My bad - I am very sorry for the noise.

Nevertheless I am glad, that the introduction of default jitter (#865) will spread the negotation more evenly.

Thank you for your time!

Please close this ticket - probably as "invalid" or "not a bug".
(it looks like I cannot do this on my own)

comment:6 Changed 2 years ago by Gert Döring

Resolution: notabug
Status: assignedclosed
Note: See TracTickets for help on using tickets.