Opened 10 months ago

Closed 6 months ago

#1434 closed Bug / Defect (fixed)

Client-disconnect script env does not have common_name

Reported by: tct Owned by:
Priority: major Milestone: release 2.5.5
Component: Generic / unclassified Version: OpenVPN git master branch (Community Ed)
Severity: Not set (select this one, unless your'e a OpenVPN developer) Keywords:
Cc:

Description (last modified by tct)

Under certain circumstances, --client-disconnect has the wrong value for common_name in the env.

In this case --username-as-common-name is in use and username is not the same as common_name.

Follow-up to #160 #1370

Possibly Related to: #167

What actually takes place is as follows:

  1. Client connects normally - UDP - username is assigned to common_name
  2. Client renegotiates normally.
  3. Client disconnects normally.

This leads the server supplying common_name not username to --client-disconnect, every time.

I ran a single standard server and a single standard client.
I used a simple script to monotonically dis/enable routing to the server, over a period of 10mins.
The client uses --reneg-sec 270, so the client has always done a renegotiate when the routing becomes disabled. Then the server does --ping-timeout for the client.
Then the server runs --client-disconnect with the wrong value for common_name, every time.

The server ran for ~20 hours and generated this error 130 times.

Change History (16)

comment:1 Changed 10 months ago by tct

If JJK's patch is applied then the behaviour becomes slightly different but the problem is not resolved.

In this case, the client does the following:

  1. Connect normally - UDP - username is assigned to common_name
  2. Renegotiate normally.
  3. Float normally. Not required
  4. Disconnect normally by time-out only.

This leads to the server not supplying any value to common_name for --client-disconnect.

Last edited 10 months ago by tct (previous) (diff)

comment:2 Changed 10 months ago by tct

Comment Too noisy - deleted in favour of the following comment

Last edited 10 months ago by tct (previous) (diff)

comment:3 Changed 10 months ago by tct

This is a shorter log from client-connect to disconnect.

In this case only a float is seen not a renegotiation. Followed by a timeout.

I have removed some excess noise:

2021-10-18 20:33:55 us=744310 core_01_b/10.20.1.2:36433 MULTI_sva: pool returned IPv4=10.171.22.201, IPv6=(Not enabled)
2021-10-18 20:33:55 us=744480 core_01_b/10.20.1.2:36433 OPTIONS IMPORT: reading client specific options from: /etc/openvpn/server/v31a/ccd/core_01_b
<EXOK> * EasyTLS-client-connect => vars loaded => dyn opts loaded => CN: core_01_b => conntrac: tallied => conntrac: registered => tls key serial: 638a99cd17661fcf1b8e41165f259d486dae443a9e3bf7879069f4ffcc35e7a5 => fixed_md_file loaded => metadata -> x509 serial match => Key is not locked to hwaddr => connection allowed => temp-files deleted

2021-10-18 20:33:55 us=766992 core_01_b/10.20.1.2:36433 OPTIONS IMPORT: reading client specific options from: /tmp/openvpn_cc_785fbac066fdc39090b1703aedf641e.tmp
2021-10-18 20:33:55 us=767238 core_01_b/10.20.1.2:36433 OPTIONS IMPORT: timers and/or timeouts modified
2021-10-18 20:33:55 us=767272 core_01_b/10.20.1.2:36433 MULTI: Learn: 10.171.22.201 -> core_01_b/10.20.1.2:36433
2021-10-18 20:33:55 us=767295 core_01_b/10.20.1.2:36433 MULTI: primary virtual IP for core_01_b/10.20.1.2:36433: 10.171.22.201
2021-10-18 20:33:55 us=767322 core_01_b/10.20.1.2:36433 Data Channel: using negotiated cipher 'CHACHA20-POLY1305'
2021-10-18 20:33:55 us=767356 core_01_b/10.20.1.2:36433 Data Channel MTU parms [ L:1535 D:1450 EF:35 EB:406 ET:0 EL:3 AF:14/122 ]
2021-10-18 20:33:55 us=767436 core_01_b/10.20.1.2:36433 Outgoing Data Channel: Cipher 'CHACHA20-POLY1305' initialized with 256 bit key
2021-10-18 20:33:55 us=767462 core_01_b/10.20.1.2:36433 Incoming Data Channel: Cipher 'CHACHA20-POLY1305' initialized with 256 bit key
2021-10-18 20:33:55 us=767535 core_01_b/10.20.1.2:36433 SENT CONTROL [core_01_b]: 'PUSH_REPLY,topology subnet,route-gateway 10.171.22.1,route 10.171.22.1,route 10.66.67.101,comp-lzo no,explicit-exit-notify 1,route 10.66.67.101,ping 10,ping-restart 150,ifconfig 10.171.22.201 255.255.255.0,peer-id 0,cipher CHACHA20-POLY1305,key-derivation tls-ekm' (status=1)
2021-10-18 20:34:54 us=268173 Float requested for peer 0 to 10.20.2.2:36433
2021-10-18 20:34:54 us=268315 peer 0 (core_01_b) floated from 10.20.1.2:36433 to [AF_INET6]::ffff:10.20.2.2:36433
2021-10-18 20:37:11 us=19054 Control Channel: using tls-crypt-v2 key
2021-10-18 20:37:11 us=19146 Outgoing Control Channel Encryption: Cipher 'AES-256-CTR' initialized with 256 bit key
2021-10-18 20:37:11 us=19175 Outgoing Control Channel Encryption: Using 256 bit message hash 'SHA256' for HMAC authentication
2021-10-18 20:37:11 us=19193 Incoming Control Channel Encryption: Cipher 'AES-256-CTR' initialized with 256 bit key
2021-10-18 20:37:11 us=19214 Incoming Control Channel Encryption: Using 256 bit message hash 'SHA256' for HMAC authentication

2021-10-18 20:37:48 us=434569 core_01_b/10.20.2.2:36433 TLS Error: TLS key negotiation failed to occur within 60 seconds (check your network connectivity)
2021-10-18 20:37:48 us=434652 core_01_b/10.20.2.2:36433 TLS Error: TLS handshake failed
2021-10-18 20:37:48 us=434687 core_01_b/10.20.2.2:36433 TLS: move_session: dest=TM_LAME_DUCK src=TM_ACTIVE reinit_src=1

2021-10-18 20:39:03 us=457918 core_01_b/10.20.2.2:36433 TLS Error: TLS key negotiation failed to occur within 60 seconds (check your network connectivity)
2021-10-18 20:39:03 us=458029 core_01_b/10.20.2.2:36433 TLS Error: TLS handshake failed
2021-10-18 20:39:16 us=526467 10.20.2.2:41196 TLS Error: TLS key negotiation failed to occur within 60 seconds (check your network connectivity)
2021-10-18 20:39:16 us=526562 10.20.2.2:41196 TLS Error: TLS handshake failed
2021-10-18 20:39:16 us=526818 10.20.2.2:41196 SIGUSR1[soft,tls-error] received, client-instance restarting
2021-10-18 20:39:58 us=222685 core_01_b/10.20.2.2:36433 [UNDEF] Inactivity timeout (--ping-restart), restarting
2021-10-18 20:39:58 us=222783 core_01_b/10.20.2.2:36433 SIGUSR1[soft,ping-restart] received, client-instance restarting
<EXOK> * EasyTLS-client-disconnect => vars loaded => CN:  => conntrac: acquire_lock => conntrac: tallied => conntrac pattern: 638a99cd17661fcf1b8e41165f259d486dae443a9e3bf7879069f4ffcc35e7a5=817D664B73EF39A1A788E84E685DC5EB====10.171.22.201 => conntrac: record not found: 638a99cd17661fcf1b8e41165f259d486dae443a9e3bf7879069f4ffcc35e7a5=817D664B73EF39A1A788E84E685DC5EB====10.171.22.201 => conntrac: release_lock => conn_trac_disconnect FAIL => conntrac: acquire_lock => conntrac: tallied => conntrac pattern: 638a99cd17661fcf1b8e41165f259d486dae443a9e3bf7879069f4ffcc35e7a5=817D664B73EF39A1A788E84E685DC5EB==core_01_b==10.171.22.201 => conntrac: unregistered => conntrac: RESET => conntrac: release_lock => conn_trac_disconnect recovered => disconnect completed
Last edited 10 months ago by tct (previous) (diff)

comment:4 Changed 10 months ago by Selva Nair

JJK's patch in #160 doesn't look right --- its trying to get the common name from the tls session and set it in env each time the script is called. But the tls session may be closed/reset when the disconnect script is executed. In that case there is no common_name in the session as its already wiped by then. And, setting an env var as null is same as unsetting it. You could get "UNDEF" in there if tls_common_name() is called with "false" but that doesn't help.

I think one approach to ensure that the "current" value of common-name is in the env would be to just set it in env right away whenever it changes --- possibly this could be done in set_common_name() itself. Then it is there when scripts are called.

Note that this does not mean the common_name will equal the username at all points in time. cert_verify() sets it to the CN in the certificate and is replaced by username (if --username-as-common_name is in use) only when verify_user_pass() succeeds.

Last edited 10 months ago by Selva Nair (previous) (diff)

comment:5 in reply to:  4 Changed 10 months ago by tct

JJK's patch does not work and masks the problem further, I include the details only for reference.

Also, thanks for the verify_auth_user_pass() tip! :)

Last edited 10 months ago by tct (previous) (diff)

comment:6 Changed 10 months ago by tct

I ran a similar test for JJK's patch and generated 44 errors in 16hours.

The test made it clear that JJK's patch will not set common_name at all after the client renegotiates and then times out. If the client disconnects using --explicit-exit-notify then JJK's patch works, as expected.

The reason to record these findings it because I have thoroughly tested the results and these are the steps required to reproduce the issues.

Summary:

  1. Without JJK's patch, all that is required to trigger the original problem is for the client to renegotiate and then disconnect. The disconnect can be a time out or explicit, the problem always occurs, regardless.
  1. With JJK's patch, the client must renegotiate and then time out to trigger the second problem.

For absolute clarity:

  1. The client must renegotiate in all circumstances.
  1. I do not believe floating has any effect on the original issue here.
    However, float probably has some effect, if JJK's patch is applied.
Last edited 10 months ago by tct (previous) (diff)

comment:7 Changed 10 months ago by tct

If anybody simply wants a solution then use Easy-TLS.

easytls-client-disconnect.sh has the following code, which catches and corrects for both errors (and others) above.

conntrac_record uses the current common_name, where as conntrac_alt_record uses username explicitly.

# Disconnect common_name
conn_trac_disconnect "${conntrac_record}" "${EASYTLS_CONN_TRAC}" || {
	case $? in
	2)	# Not fatal because errors are expected #160
		update_status "conn_trac_disconnect FAIL"
		conntrac_fail=1
		log_env=1
	;;
	1)	# Fatal because these are usage errors
		[ $FATAL_CONN_TRAC ] && {
			ENABLE_KILL_PPID=1
			die "CONNTRAC_DISCONNECT_FILE_ERROR" 99
			}
		update_status "conn_trac_disconnect ERROR"
		conntrac_error=1
		log_env=1
	;;
	*)	# Absolutely fatal
		ENABLE_KILL_PPID=1
		die "CONNTRAC_DISCONNECT_UNKNOWN" 98
	;;
	esac
	}

# If the first failed for number two then try again ..
if [ $conntrac_fail ]
then
	# Disconnect username
	conn_trac_disconnect "${conntrac_alt_rec}" "${EASYTLS_CONN_TRAC}" || {
		case $? in
		2)	# fatal later - because errors could happen #160
			update_status "conn_trac_disconnect A-FAIL"
			conntrac_alt_fail=1
			log_env=1
		;;
		1)	# Fatal because these are usage errors
			[ $FATAL_CONN_TRAC ] && {
				ENABLE_KILL_PPID=1
				die "CONNTRAC_DISCONNECT_ALT_FILE_ERROR" 99
				}
			update_status "conn_trac_disconnect A-ERROR"
			conntrac_alt_error=1
			log_env=1
		;;
		*)	# Absolutely fatal
			ENABLE_KILL_PPID=1
			die "CONNTRAC_DISCONNECT_UNKNOWN" 98
		;;
		esac
		}
fi

comment:8 Changed 10 months ago by Gert Döring

Milestone: release 2.6
Version: OpenVPN git master branch (Community Ed)

Thanks for making this so easy to reproduce.

Can we close the #160 bug then (because that ticket is way too long now :-) )?

... and let's see if we can find the right spot in our code before 2.6.0 to fix this.

comment:9 Changed 10 months ago by tct

Description: modified (diff)

comment:10 in reply to:  8 Changed 10 months ago by tct

I am hoisted by my own petard!

I would prefer that people use Easy-TLS to solve the problem ;-)

Phase 3: Profit!

comment:11 Changed 10 months ago by tct

Description: modified (diff)

comment:12 Changed 10 months ago by Selva Nair

Below is a patch that should fix this. Please test.

I shouldn't theorize before the test results are in, but theory that's often wrong is what I do for a living. So:

(all of the below assumes username-as-common-name is in use)

The common_name is set in env in two occasions:
(i) while calling auth-user-pass-verify
(ii) when client-connect script is called.

In (i) it will be the CN taken from the client certificate, if any, or "" ("UNDEF" when read). In (ii) it will have the value replaced by username.

The disconnect script gets whatever was set the last time.

The problem is that, during a reneg, only (i) happens not (ii), and that always leads to overwriting the common_name by CN from cert after a reneg. When the disconnect script runs after a reneg what it sees is that CN. If disconnect happens before a reneg it will get the username. The value in the session variable is always correct, its only the env that is not updated.

Setting common_name in env again just before calling the disconnect script (the patch in 160) will not fix this. That'll just unset common_name as by then the tls session is freed, so is common_name. The fix below is to set the common_name in env at the point where it changes, every time.

diff --git a/src/openvpn/ssl_verify.c b/src/openvpn/ssl_verify.c
index b745b3c7..50ac349d 100644
--- a/src/openvpn/ssl_verify.c
+++ b/src/openvpn/ssl_verify.c
@@ -101,6 +101,8 @@ set_common_name(struct tls_session *session, const char *common_name)
         /* FIXME: Last alloc will never be freed */
         session->common_name = string_alloc(common_name, NULL);
     }
+    /* update common name in env */
+    setenv_str(session->opt->es, "common_name", common_name);
 }

PS:
In reality, the common_name is locked after first handshake, and its double reset during reneg wouldn't be required if auth-user-pass-verify can handle common_name = CN in the first round and common_name = username during renegs. But that would be a regression and not the intended behaviour in my view.

Last edited 10 months ago by Selva Nair (previous) (diff)

comment:13 in reply to:  12 Changed 10 months ago by tct

Replying to Selva Nair:

Below is a patch that should fix this. Please test.

Thanks for the patch. Tested and it works.

I ran my test for 6 hours and generated 400 disconnect events of various nature. The test generated zero errors. Previous tests, without the patch, would have generated ~50 errors.

comment:14 Changed 10 months ago by Selva Nair

Thanks for testing. Will send in a patch.

comment:15 Changed 10 months ago by Gert Döring

commit fa5ab2438ad2d8a12eaf43e2cdd8b4294299c175 (master)
commit a2412bf4a6bb6ac7a6f26128d00fe81b0fa4a18e (release/2.5)
commit 644bcb87e76f0c56ec5b6cb205771a00fc28e598 (release/2.4)

Author: Selva Nair <selva.nair@…>
Date: Fri Oct 22 20:07:05 2021 -0400

Ensure the current common_name is in the environment for scripts


Fix by adding common_name to env whenever it changes.

"so obvious" :-)

(We might remove the other parts where common_name is added to the env in master, as a matter of "why do it twice", but that's more cleanup than bugfix - from quick reading, that would be set_verify_user_pass_env() and link_socket_set_outgoing_addr()->link_socket_connection_initiated() [what a mess...])

comment:16 Changed 6 months ago by Gert Döring

Milestone: release 2.6release 2.5.5
Resolution: fixed
Status: newclosed

The commit noted above went into 2.5.5 release and will be in an eventual 2.4.12 release.

So this ticket can be closed - bug found, fixed, released :-) - thanks to all involved.

Note: See TracTickets for help on using tickets.