Opened 2 years ago

Last modified 2 years ago

#1434 closed Bug / Defect

Client-disconnect script env does not have common_name — at Version 9

Reported by: tct Owned by:
Priority: major Milestone: release 2.5.5
Component: Generic / unclassified Version: OpenVPN git master branch (Community Ed)
Severity: Not set (select this one, unless your'e a OpenVPN developer) Keywords:
Cc:

Description (last modified by tct)

Under certain circumstances, --client-disconnect has the wrong value for common_name in the env.

In this case --username-as-common-name is in use and username is not the same as common_name.

This ticket is a follow-up to #160 #1370

What actually takes place is as follows:

  1. Client connects normally - UDP - username is assigned to common_name
  2. Client renegotiates normally.
  3. Client disconnects normally.

This leads the server supplying common_name not username to --client-disconnect, every time.

I ran a single standard server and a single standard client.
I used a simple script to monotonically dis/enable routing to the server, over a period of 10mins.
The client uses --reneg-sec 270, so the client has always done a renegotiate when the routing becomes disabled. Then the server does --ping-timeout for the client.
Then the server runs --client-disconnect with the wrong value for common_name, every time.

The server ran for ~20 hours and generated this error 130 times.

Change History (9)

comment:1 Changed 2 years ago by tct

If JJK's patch is applied then the behaviour becomes slightly different but the problem is not resolved.

In this case, the client does the following:

  1. Connect normally - UDP - username is assigned to common_name
  2. Renegotiate normally.
  3. Float normally. Not required
  4. Disconnect normally by time-out only.

This leads to the server not supplying any value to common_name for --client-disconnect.

Last edited 2 years ago by tct (previous) (diff)

comment:2 Changed 2 years ago by tct

Comment Too noisy - deleted in favour of the following comment

Last edited 2 years ago by tct (previous) (diff)

comment:3 Changed 2 years ago by tct

This is a shorter log from client-connect to disconnect.

In this case only a float is seen not a renegotiation. Followed by a timeout.

I have removed some excess noise:

2021-10-18 20:33:55 us=744310 core_01_b/10.20.1.2:36433 MULTI_sva: pool returned IPv4=10.171.22.201, IPv6=(Not enabled)
2021-10-18 20:33:55 us=744480 core_01_b/10.20.1.2:36433 OPTIONS IMPORT: reading client specific options from: /etc/openvpn/server/v31a/ccd/core_01_b
<EXOK> * EasyTLS-client-connect => vars loaded => dyn opts loaded => CN: core_01_b => conntrac: tallied => conntrac: registered => tls key serial: 638a99cd17661fcf1b8e41165f259d486dae443a9e3bf7879069f4ffcc35e7a5 => fixed_md_file loaded => metadata -> x509 serial match => Key is not locked to hwaddr => connection allowed => temp-files deleted

2021-10-18 20:33:55 us=766992 core_01_b/10.20.1.2:36433 OPTIONS IMPORT: reading client specific options from: /tmp/openvpn_cc_785fbac066fdc39090b1703aedf641e.tmp
2021-10-18 20:33:55 us=767238 core_01_b/10.20.1.2:36433 OPTIONS IMPORT: timers and/or timeouts modified
2021-10-18 20:33:55 us=767272 core_01_b/10.20.1.2:36433 MULTI: Learn: 10.171.22.201 -> core_01_b/10.20.1.2:36433
2021-10-18 20:33:55 us=767295 core_01_b/10.20.1.2:36433 MULTI: primary virtual IP for core_01_b/10.20.1.2:36433: 10.171.22.201
2021-10-18 20:33:55 us=767322 core_01_b/10.20.1.2:36433 Data Channel: using negotiated cipher 'CHACHA20-POLY1305'
2021-10-18 20:33:55 us=767356 core_01_b/10.20.1.2:36433 Data Channel MTU parms [ L:1535 D:1450 EF:35 EB:406 ET:0 EL:3 AF:14/122 ]
2021-10-18 20:33:55 us=767436 core_01_b/10.20.1.2:36433 Outgoing Data Channel: Cipher 'CHACHA20-POLY1305' initialized with 256 bit key
2021-10-18 20:33:55 us=767462 core_01_b/10.20.1.2:36433 Incoming Data Channel: Cipher 'CHACHA20-POLY1305' initialized with 256 bit key
2021-10-18 20:33:55 us=767535 core_01_b/10.20.1.2:36433 SENT CONTROL [core_01_b]: 'PUSH_REPLY,topology subnet,route-gateway 10.171.22.1,route 10.171.22.1,route 10.66.67.101,comp-lzo no,explicit-exit-notify 1,route 10.66.67.101,ping 10,ping-restart 150,ifconfig 10.171.22.201 255.255.255.0,peer-id 0,cipher CHACHA20-POLY1305,key-derivation tls-ekm' (status=1)
2021-10-18 20:34:54 us=268173 Float requested for peer 0 to 10.20.2.2:36433
2021-10-18 20:34:54 us=268315 peer 0 (core_01_b) floated from 10.20.1.2:36433 to [AF_INET6]::ffff:10.20.2.2:36433
2021-10-18 20:37:11 us=19054 Control Channel: using tls-crypt-v2 key
2021-10-18 20:37:11 us=19146 Outgoing Control Channel Encryption: Cipher 'AES-256-CTR' initialized with 256 bit key
2021-10-18 20:37:11 us=19175 Outgoing Control Channel Encryption: Using 256 bit message hash 'SHA256' for HMAC authentication
2021-10-18 20:37:11 us=19193 Incoming Control Channel Encryption: Cipher 'AES-256-CTR' initialized with 256 bit key
2021-10-18 20:37:11 us=19214 Incoming Control Channel Encryption: Using 256 bit message hash 'SHA256' for HMAC authentication

2021-10-18 20:37:48 us=434569 core_01_b/10.20.2.2:36433 TLS Error: TLS key negotiation failed to occur within 60 seconds (check your network connectivity)
2021-10-18 20:37:48 us=434652 core_01_b/10.20.2.2:36433 TLS Error: TLS handshake failed
2021-10-18 20:37:48 us=434687 core_01_b/10.20.2.2:36433 TLS: move_session: dest=TM_LAME_DUCK src=TM_ACTIVE reinit_src=1

2021-10-18 20:39:03 us=457918 core_01_b/10.20.2.2:36433 TLS Error: TLS key negotiation failed to occur within 60 seconds (check your network connectivity)
2021-10-18 20:39:03 us=458029 core_01_b/10.20.2.2:36433 TLS Error: TLS handshake failed
2021-10-18 20:39:16 us=526467 10.20.2.2:41196 TLS Error: TLS key negotiation failed to occur within 60 seconds (check your network connectivity)
2021-10-18 20:39:16 us=526562 10.20.2.2:41196 TLS Error: TLS handshake failed
2021-10-18 20:39:16 us=526818 10.20.2.2:41196 SIGUSR1[soft,tls-error] received, client-instance restarting
2021-10-18 20:39:58 us=222685 core_01_b/10.20.2.2:36433 [UNDEF] Inactivity timeout (--ping-restart), restarting
2021-10-18 20:39:58 us=222783 core_01_b/10.20.2.2:36433 SIGUSR1[soft,ping-restart] received, client-instance restarting
<EXOK> * EasyTLS-client-disconnect => vars loaded => CN:  => conntrac: acquire_lock => conntrac: tallied => conntrac pattern: 638a99cd17661fcf1b8e41165f259d486dae443a9e3bf7879069f4ffcc35e7a5=817D664B73EF39A1A788E84E685DC5EB====10.171.22.201 => conntrac: record not found: 638a99cd17661fcf1b8e41165f259d486dae443a9e3bf7879069f4ffcc35e7a5=817D664B73EF39A1A788E84E685DC5EB====10.171.22.201 => conntrac: release_lock => conn_trac_disconnect FAIL => conntrac: acquire_lock => conntrac: tallied => conntrac pattern: 638a99cd17661fcf1b8e41165f259d486dae443a9e3bf7879069f4ffcc35e7a5=817D664B73EF39A1A788E84E685DC5EB==core_01_b==10.171.22.201 => conntrac: unregistered => conntrac: RESET => conntrac: release_lock => conn_trac_disconnect recovered => disconnect completed
Last edited 2 years ago by tct (previous) (diff)

comment:4 Changed 2 years ago by Selva Nair

JJK's patch in #160 doesn't look right --- its trying to get the common name from the tls session and set it in env each time the script is called. But the tls session may be closed/reset when the disconnect script is executed. In that case there is no common_name in the session as its already wiped by then. And, setting an env var as null is same as unsetting it. You could get "UNDEF" in there if tls_common_name() is called with "false" but that doesn't help.

I think one approach to ensure that the "current" value of common-name is in the env would be to just set it in env right away whenever it changes --- possibly this could be done in set_common_name() itself. Then it is there when scripts are called.

Note that this does not mean the common_name will equal the username at all points in time. cert_verify() sets it to the CN in the certificate and is replaced by username (if --username-as-common_name is in use) only when verify_user_pass() succeeds.

Last edited 2 years ago by Selva Nair (previous) (diff)

comment:5 in reply to:  4 Changed 2 years ago by tct

JJK's patch does not work and masks the problem further, I include the details only for reference.

Also, thanks for the verify_auth_user_pass() tip! :)

Last edited 2 years ago by tct (previous) (diff)

comment:6 Changed 2 years ago by tct

I ran a similar test for JJK's patch and generated 44 errors in 16hours.

The test made it clear that JJK's patch will not set common_name at all after the client renegotiates and then times out. If the client disconnects using --explicit-exit-notify then JJK's patch works, as expected.

The reason to record these findings it because I have thoroughly tested the results and these are the steps required to reproduce the issues.

Summary:

  1. Without JJK's patch, all that is required to trigger the original problem is for the client to renegotiate and then disconnect. The disconnect can be a time out or explicit, the problem always occurs, regardless.
  1. With JJK's patch, the client must renegotiate and then time out to trigger the second problem.

For absolute clarity:

  1. The client must renegotiate in all circumstances.
  1. I do not believe floating has any effect on the original issue here.
    However, float probably has some effect, if JJK's patch is applied.
Last edited 2 years ago by tct (previous) (diff)

comment:7 Changed 2 years ago by tct

If anybody simply wants a solution then use Easy-TLS.

easytls-client-disconnect.sh has the following code, which catches and corrects for both errors (and others) above.

conntrac_record uses the current common_name, where as conntrac_alt_record uses username explicitly.

# Disconnect common_name
conn_trac_disconnect "${conntrac_record}" "${EASYTLS_CONN_TRAC}" || {
	case $? in
	2)	# Not fatal because errors are expected #160
		update_status "conn_trac_disconnect FAIL"
		conntrac_fail=1
		log_env=1
	;;
	1)	# Fatal because these are usage errors
		[ $FATAL_CONN_TRAC ] && {
			ENABLE_KILL_PPID=1
			die "CONNTRAC_DISCONNECT_FILE_ERROR" 99
			}
		update_status "conn_trac_disconnect ERROR"
		conntrac_error=1
		log_env=1
	;;
	*)	# Absolutely fatal
		ENABLE_KILL_PPID=1
		die "CONNTRAC_DISCONNECT_UNKNOWN" 98
	;;
	esac
	}

# If the first failed for number two then try again ..
if [ $conntrac_fail ]
then
	# Disconnect username
	conn_trac_disconnect "${conntrac_alt_rec}" "${EASYTLS_CONN_TRAC}" || {
		case $? in
		2)	# fatal later - because errors could happen #160
			update_status "conn_trac_disconnect A-FAIL"
			conntrac_alt_fail=1
			log_env=1
		;;
		1)	# Fatal because these are usage errors
			[ $FATAL_CONN_TRAC ] && {
				ENABLE_KILL_PPID=1
				die "CONNTRAC_DISCONNECT_ALT_FILE_ERROR" 99
				}
			update_status "conn_trac_disconnect A-ERROR"
			conntrac_alt_error=1
			log_env=1
		;;
		*)	# Absolutely fatal
			ENABLE_KILL_PPID=1
			die "CONNTRAC_DISCONNECT_UNKNOWN" 98
		;;
		esac
		}
fi

comment:8 Changed 2 years ago by Gert Döring

Milestone: release 2.6
Version: OpenVPN git master branch (Community Ed)

Thanks for making this so easy to reproduce.

Can we close the #160 bug then (because that ticket is way too long now :-) )?

... and let's see if we can find the right spot in our code before 2.6.0 to fix this.

comment:9 Changed 2 years ago by tct

Description: modified (diff)
Note: See TracTickets for help on using tickets.