Opened 3 years ago
Closed 3 years ago
#1434 closed Bug / Defect (fixed)
Client-disconnect script env does not have common_name
Reported by: | tct | Owned by: | |
---|---|---|---|
Priority: | major | Milestone: | release 2.5.5 |
Component: | Generic / unclassified | Version: | OpenVPN git master branch (Community Ed) |
Severity: | Not set (select this one, unless your'e a OpenVPN developer) | Keywords: | |
Cc: |
Description (last modified by )
Under certain circumstances, --client-disconnect
has the wrong value for common_name
in the env
.
In this case --username-as-common-name
is in use and username
is not the same as common_name
.
Possibly Related to: #167
What actually takes place is as follows:
- Client connects normally - UDP -
username
is assigned tocommon_name
- Client renegotiates normally.
- Client disconnects normally.
This leads the server supplying common_name
not username
to --client-disconnect
, every time.
I ran a single standard server and a single standard client.
I used a simple script to monotonically dis/enable routing to the server, over a period of 10mins.
The client uses --reneg-sec 270
, so the client has always done a renegotiate when the routing becomes disabled. Then the server does --ping-timeout
for the client.
Then the server runs --client-disconnect
with the wrong value for common_name
, every time.
The server ran for ~20 hours and generated this error 130 times.
Change History (16)
comment:3 Changed 3 years ago by
This is a shorter log from client-connect to disconnect.
In this case only a float is seen not a renegotiation. Followed by a timeout.
I have removed some excess noise:
2021-10-18 20:33:55 us=744310 core_01_b/10.20.1.2:36433 MULTI_sva: pool returned IPv4=10.171.22.201, IPv6=(Not enabled) 2021-10-18 20:33:55 us=744480 core_01_b/10.20.1.2:36433 OPTIONS IMPORT: reading client specific options from: /etc/openvpn/server/v31a/ccd/core_01_b <EXOK> * EasyTLS-client-connect => vars loaded => dyn opts loaded => CN: core_01_b => conntrac: tallied => conntrac: registered => tls key serial: 638a99cd17661fcf1b8e41165f259d486dae443a9e3bf7879069f4ffcc35e7a5 => fixed_md_file loaded => metadata -> x509 serial match => Key is not locked to hwaddr => connection allowed => temp-files deleted 2021-10-18 20:33:55 us=766992 core_01_b/10.20.1.2:36433 OPTIONS IMPORT: reading client specific options from: /tmp/openvpn_cc_785fbac066fdc39090b1703aedf641e.tmp 2021-10-18 20:33:55 us=767238 core_01_b/10.20.1.2:36433 OPTIONS IMPORT: timers and/or timeouts modified 2021-10-18 20:33:55 us=767272 core_01_b/10.20.1.2:36433 MULTI: Learn: 10.171.22.201 -> core_01_b/10.20.1.2:36433 2021-10-18 20:33:55 us=767295 core_01_b/10.20.1.2:36433 MULTI: primary virtual IP for core_01_b/10.20.1.2:36433: 10.171.22.201 2021-10-18 20:33:55 us=767322 core_01_b/10.20.1.2:36433 Data Channel: using negotiated cipher 'CHACHA20-POLY1305' 2021-10-18 20:33:55 us=767356 core_01_b/10.20.1.2:36433 Data Channel MTU parms [ L:1535 D:1450 EF:35 EB:406 ET:0 EL:3 AF:14/122 ] 2021-10-18 20:33:55 us=767436 core_01_b/10.20.1.2:36433 Outgoing Data Channel: Cipher 'CHACHA20-POLY1305' initialized with 256 bit key 2021-10-18 20:33:55 us=767462 core_01_b/10.20.1.2:36433 Incoming Data Channel: Cipher 'CHACHA20-POLY1305' initialized with 256 bit key 2021-10-18 20:33:55 us=767535 core_01_b/10.20.1.2:36433 SENT CONTROL [core_01_b]: 'PUSH_REPLY,topology subnet,route-gateway 10.171.22.1,route 10.171.22.1,route 10.66.67.101,comp-lzo no,explicit-exit-notify 1,route 10.66.67.101,ping 10,ping-restart 150,ifconfig 10.171.22.201 255.255.255.0,peer-id 0,cipher CHACHA20-POLY1305,key-derivation tls-ekm' (status=1) 2021-10-18 20:34:54 us=268173 Float requested for peer 0 to 10.20.2.2:36433 2021-10-18 20:34:54 us=268315 peer 0 (core_01_b) floated from 10.20.1.2:36433 to [AF_INET6]::ffff:10.20.2.2:36433 2021-10-18 20:37:11 us=19054 Control Channel: using tls-crypt-v2 key 2021-10-18 20:37:11 us=19146 Outgoing Control Channel Encryption: Cipher 'AES-256-CTR' initialized with 256 bit key 2021-10-18 20:37:11 us=19175 Outgoing Control Channel Encryption: Using 256 bit message hash 'SHA256' for HMAC authentication 2021-10-18 20:37:11 us=19193 Incoming Control Channel Encryption: Cipher 'AES-256-CTR' initialized with 256 bit key 2021-10-18 20:37:11 us=19214 Incoming Control Channel Encryption: Using 256 bit message hash 'SHA256' for HMAC authentication 2021-10-18 20:37:48 us=434569 core_01_b/10.20.2.2:36433 TLS Error: TLS key negotiation failed to occur within 60 seconds (check your network connectivity) 2021-10-18 20:37:48 us=434652 core_01_b/10.20.2.2:36433 TLS Error: TLS handshake failed 2021-10-18 20:37:48 us=434687 core_01_b/10.20.2.2:36433 TLS: move_session: dest=TM_LAME_DUCK src=TM_ACTIVE reinit_src=1 2021-10-18 20:39:03 us=457918 core_01_b/10.20.2.2:36433 TLS Error: TLS key negotiation failed to occur within 60 seconds (check your network connectivity) 2021-10-18 20:39:03 us=458029 core_01_b/10.20.2.2:36433 TLS Error: TLS handshake failed 2021-10-18 20:39:16 us=526467 10.20.2.2:41196 TLS Error: TLS key negotiation failed to occur within 60 seconds (check your network connectivity) 2021-10-18 20:39:16 us=526562 10.20.2.2:41196 TLS Error: TLS handshake failed 2021-10-18 20:39:16 us=526818 10.20.2.2:41196 SIGUSR1[soft,tls-error] received, client-instance restarting 2021-10-18 20:39:58 us=222685 core_01_b/10.20.2.2:36433 [UNDEF] Inactivity timeout (--ping-restart), restarting 2021-10-18 20:39:58 us=222783 core_01_b/10.20.2.2:36433 SIGUSR1[soft,ping-restart] received, client-instance restarting <EXOK> * EasyTLS-client-disconnect => vars loaded => CN: => conntrac: acquire_lock => conntrac: tallied => conntrac pattern: 638a99cd17661fcf1b8e41165f259d486dae443a9e3bf7879069f4ffcc35e7a5=817D664B73EF39A1A788E84E685DC5EB====10.171.22.201 => conntrac: record not found: 638a99cd17661fcf1b8e41165f259d486dae443a9e3bf7879069f4ffcc35e7a5=817D664B73EF39A1A788E84E685DC5EB====10.171.22.201 => conntrac: release_lock => conn_trac_disconnect FAIL => conntrac: acquire_lock => conntrac: tallied => conntrac pattern: 638a99cd17661fcf1b8e41165f259d486dae443a9e3bf7879069f4ffcc35e7a5=817D664B73EF39A1A788E84E685DC5EB==core_01_b==10.171.22.201 => conntrac: unregistered => conntrac: RESET => conntrac: release_lock => conn_trac_disconnect recovered => disconnect completed
comment:4 follow-up: 5 Changed 3 years ago by
JJK's patch in #160 doesn't look right --- its trying to get the common name from the tls session and set it in env each time the script is called. But the tls session may be closed/reset when the disconnect script is executed. In that case there is no common_name in the session as its already wiped by then. And, setting an env var as null is same as unsetting it. You could get "UNDEF" in there if tls_common_name() is called with "false" but that doesn't help.
I think one approach to ensure that the "current" value of common-name is in the env would be to just set it in env right away whenever it changes --- possibly this could be done in set_common_name() itself. Then it is there when scripts are called.
Note that this does not mean the common_name will equal the username at all points in time. cert_verify() sets it to the CN in the certificate and is replaced by username (if --username-as-common_name is in use) only when verify_user_pass() succeeds.
comment:5 Changed 3 years ago by
JJK's patch does not work and masks the problem further, I include the details only for reference.
Also, thanks for the verify_auth_user_pass()
tip! :)
comment:6 Changed 3 years ago by
I ran a similar test for JJK's patch and generated 44 errors in 16hours.
The test made it clear that JJK's patch will not set common_name
at all after the client renegotiates and then times out. If the client disconnects using --explicit-exit-notify
then JJK's patch works, as expected.
The reason to record these findings it because I have thoroughly tested the results and these are the steps required to reproduce the issues.
Summary:
- Without JJK's patch, all that is required to trigger the original problem is for the client to renegotiate and then disconnect. The disconnect can be a time out or explicit, the problem always occurs, regardless.
- With JJK's patch, the client must renegotiate and then time out to trigger the second problem.
For absolute clarity:
- The client must renegotiate in all circumstances.
- I do not believe floating has any effect on the original issue here.
However, float probably has some effect, if JJK's patch is applied.
comment:7 Changed 3 years ago by
If anybody simply wants a solution then use Easy-TLS.
easytls-client-disconnect.sh
has the following code, which catches and corrects for both errors (and others) above.
conntrac_record
uses the current common_name
, where as conntrac_alt_record
uses username
explicitly.
# Disconnect common_name conn_trac_disconnect "${conntrac_record}" "${EASYTLS_CONN_TRAC}" || { case $? in 2) # Not fatal because errors are expected #160 update_status "conn_trac_disconnect FAIL" conntrac_fail=1 log_env=1 ;; 1) # Fatal because these are usage errors [ $FATAL_CONN_TRAC ] && { ENABLE_KILL_PPID=1 die "CONNTRAC_DISCONNECT_FILE_ERROR" 99 } update_status "conn_trac_disconnect ERROR" conntrac_error=1 log_env=1 ;; *) # Absolutely fatal ENABLE_KILL_PPID=1 die "CONNTRAC_DISCONNECT_UNKNOWN" 98 ;; esac } # If the first failed for number two then try again .. if [ $conntrac_fail ] then # Disconnect username conn_trac_disconnect "${conntrac_alt_rec}" "${EASYTLS_CONN_TRAC}" || { case $? in 2) # fatal later - because errors could happen #160 update_status "conn_trac_disconnect A-FAIL" conntrac_alt_fail=1 log_env=1 ;; 1) # Fatal because these are usage errors [ $FATAL_CONN_TRAC ] && { ENABLE_KILL_PPID=1 die "CONNTRAC_DISCONNECT_ALT_FILE_ERROR" 99 } update_status "conn_trac_disconnect A-ERROR" conntrac_alt_error=1 log_env=1 ;; *) # Absolutely fatal ENABLE_KILL_PPID=1 die "CONNTRAC_DISCONNECT_UNKNOWN" 98 ;; esac } fi
comment:8 follow-up: 10 Changed 3 years ago by
Milestone: | → release 2.6 |
---|---|
Version: | → OpenVPN git master branch (Community Ed) |
Thanks for making this so easy to reproduce.
Can we close the #160 bug then (because that ticket is way too long now :-) )?
... and let's see if we can find the right spot in our code before 2.6.0 to fix this.
comment:9 Changed 3 years ago by
Description: | modified (diff) |
---|
comment:10 Changed 3 years ago by
I am hoisted by my own petard!
I would prefer that people use Easy-TLS to solve the problem ;-)
Phase 3: Profit!
comment:11 Changed 3 years ago by
Description: | modified (diff) |
---|
comment:12 follow-up: 13 Changed 3 years ago by
Below is a patch that should fix this. Please test.
I shouldn't theorize before the test results are in, but theory that's often wrong is what I do for a living. So:
(all of the below assumes username-as-common_name
is in use)
The common_name
is set in env in two occasions:
(i) while calling auth-user-pass-verify
(ii) when client-connect
script is called.
In (i) it will be the CN
taken from the client certificate, if any, or "" ("UNDEF" when read). In (ii) it will have the value replaced by username
.
The disconnect script gets whatever was set the last time.
The problem is that, during a reneg, only (i) happens not (ii), and that always leads to overwriting the common_name
by CN
from cert after a reneg. When the disconnect script runs after a reneg what it sees is that CN
. If disconnect happens before a reneg it will get the username
. The value in the session variable is always correct, its only the env that is not updated.
Setting common_name
in env again just before calling the disconnect script (the patch in 160) will not fix this. That'll just unset common_name
as by then the tls session is freed, so is common_name
. The fix below is to set the common_name
in env at the point where it changes, every time.
diff --git a/src/openvpn/ssl_verify.c b/src/openvpn/ssl_verify.c index b745b3c7..50ac349d 100644 --- a/src/openvpn/ssl_verify.c +++ b/src/openvpn/ssl_verify.c @@ -101,6 +101,8 @@ set_common_name(struct tls_session *session, const char *common_name) /* FIXME: Last alloc will never be freed */ session->common_name = string_alloc(common_name, NULL); } + /* update common name in env */ + setenv_str(session->opt->es, "common_name", common_name); }
PS:
In reality, the common_name
is locked after first handshake, and its double reset during reneg wouldn't be required if auth-user-pass-verify
can handle common_name = CN
in the first round and common_name = username
during renegs. But that would be a regression and not the intended behaviour in my view.
comment:13 Changed 3 years ago by
Replying to Selva Nair:
Below is a patch that should fix this. Please test.
Thanks for the patch. Tested and it works.
I ran my test for 6 hours and generated 400 disconnect events of various nature. The test generated zero errors. Previous tests, without the patch, would have generated ~50 errors.
comment:15 Changed 3 years ago by
commit fa5ab2438ad2d8a12eaf43e2cdd8b4294299c175 (master)
commit a2412bf4a6bb6ac7a6f26128d00fe81b0fa4a18e (release/2.5)
commit 644bcb87e76f0c56ec5b6cb205771a00fc28e598 (release/2.4)
Author: Selva Nair <selva.nair@…>
Date: Fri Oct 22 20:07:05 2021 -0400
Ensure the current common_name is in the environment for scripts
Fix by adding common_name to env whenever it changes.
"so obvious" :-)
(We might remove the other parts where common_name is added to the env in master, as a matter of "why do it twice", but that's more cleanup than bugfix - from quick reading, that would be set_verify_user_pass_env() and link_socket_set_outgoing_addr()->link_socket_connection_initiated() [what a mess...])
comment:16 Changed 3 years ago by
Milestone: | release 2.6 → release 2.5.5 |
---|---|
Resolution: | → fixed |
Status: | new → closed |
The commit noted above went into 2.5.5 release and will be in an eventual 2.4.12 release.
So this ticket can be closed - bug found, fixed, released :-) - thanks to all involved.
If JJK's patch is applied then the behaviour becomes slightly different but the problem is not resolved.
In this case, the client does the following:
username
is assigned tocommon_name
Float normally.Not requirednormallyby time-out only.This leads to the server not supplying any value to
common_name
for--client-disconnect
.