Opened 2 years ago

Last modified 20 months ago

#1471 new Bug / Defect

State stuck at AUTH after reneg failure and recovery

Reported by: Selva Nair Owned by:
Priority: major Milestone:
Component: Generic / unclassified Version: OpenVPN 2.5.7 (Community Ed)
Severity: Not set (select this one, unless your'e a OpenVPN developer) Keywords:
Cc: plaisthos

Description

(client version 2.5.7)

After a successful connection, if a subsequent reneg fails, a new TLS session gets established, but the state as reported by management interface never proceeds beyond "AUTH". The connection is, however, up and working.

For a management client that connects mid-session this is confusing if not impossible to handle without some hackery like parsing log history to check how it got there.

Here is a snippet from interaction with the management interface of a client that is initially in state = CONNECTED. reneg-sec is set to 60 for debugging.

state
1658699592,CONNECTED,SUCCESS,10.9.0.4,192.z.x.y,1194,,,26xx:x:x:x::1002
log on
>LOG:1658699711,,TLS: soft reset sec=60/60 bytes=2640/-1 pkts=24/0

Just before this point I start a new connection with same common-name which causes the server to drop this session. The next reneg fails and new session starts.

>LOG:1658699771,N,TLS Error: TLS key negotiation failed to occur within 60 seconds (check your network connectivity)
>LOG:1658699771,N,TLS Error: TLS handshake failed
>LOG:1658699771,,TLS: move_session: dest=TM_LAME_DUCK src=TM_ACTIVE reinit_src=1
>LOG:1658699786,,MANAGEMENT: >STATE:1658699786,WAIT,,,,,,
>STATE:1658699786,WAIT,,,,,,
>LOG:1658699786,,MANAGEMENT: >STATE:1658699786,AUTH,,,,,,
>STATE:1658699786,AUTH,,,,,,
>LOG:1658699786,,TLS: Initial packet from [AF_INET]192.z.x.y:1194, sid=f37f1560 944674f0
>LOG:1658699786,,VERIFY OK: depth=1, C=CA, ST=ON, L=Toronto, O=x, OU=x, CN=x, emailAddress=x@y
>LOG:1658699786,,VERIFY OK: nsCertType=SERVER
>LOG:1658699786,,VERIFY OK: depth=0, C=CA, ST=ON, L=Toronto, O=x, OU=x, CN=ec-384r1, name=server, emailAddress=x@y
>LOG:1658699786,,Control Channel: TLSv1.3, cipher TLSv1.3 TLS_AES_256_GCM_SHA384, peer certificate: 384 bit EC, curve secp384r1, signature: RSA-SHA256
>LOG:1658699787,,PUSH: Received control message: 'PUSH_REPLY,route 192.x.y.0 255.255.255.0,explicit-exit-notify 1,tun-ipv6,tun-ipv6,route-gateway 10.9.0.1,topology subnet,ping 30,ping-restart 120,ifconfig-ipv6 26x:x:x:x::1002/64 26x:x:x:x::1,ifconfig 10.9.0.4 255.255.255.0,peer-id 0,cipher AES-256-GCM'
>LOG:1658699787,,Pushed option removed by filter: 'route 192.168.0.0 255.255.255.0'

At this point state = AUTH and it never changes. The client does not seem to treat this as a new connection(as do_up() is already done?).

Subsequent renegs succeed as seen below.

>LOG:1658699846,,TLS: soft reset sec=60/60 bytes=0/-1 pkts=0/0
>LOG:1658699846,,VERIFY OK: depth=1, C=CA, ST=ON, L=Toronto, O=x, OU=x, CN=x, emailAddress=x@y
>LOG:1658699846,,VERIFY OK: nsCertType=SERVER
>LOG:1658699846,,VERIFY OK: depth=0, C=CA, ST=ON, L=Toronto, O=x, OU=x, CN=ec-384r1, name=server, emailAddress=x@y
>LOG:1658699846,,Outgoing Data Channel: Cipher 'AES-256-GCM' initialized with 256 bit key
>LOG:1658699846,,Incoming Data Channel: Cipher 'AES-256-GCM' initialized with 256 bit key
>LOG:1658699846,,Control Channel: TLSv1.3, cipher TLSv1.3 TLS_AES_256_GCM_SHA384, peer certificate: 384 bit EC, curve secp384r1, signature: RSA-SHA256
>LOG:1658699907,,TLS: soft reset sec=61/60 bytes=0/-1 pkts=0/0

and on and on. State not changing to CONNECTED looks like a bug to me.

Change History (3)

comment:1 Changed 2 years ago by Gert Döring

Cc: plaisthos added
Version: OpenVPN 2.5.7 (Community Ed)

I wonder if this is "has been broken forever" or due to the TLS changes going into 2.5.0.

Could you re-test with 2.4.12 and git master?

I agree that it looks like a bug ("if the client is happy and packets are passing, state is CONNECTED, not AUTH"). But it's possible that this is 2.5-only, or "all the versions, ever".

comment:2 Changed 23 months ago by Selva Nair

Could you re-test with 2.4.12 and git master?

Sorry for the delay -- trac never notified of this response.
2.4.12 and git master behave the same. Looks like this has been like this for long.

May be a rare occurrence as this requires another client to replace this one during a short span between ping-restart and 0 sec before next reneg. Then server rejects reneg, and client does a move_session after TLS handshake time-out. A quick solution appears to be to use a short enough ping-restart so that a restart would trigger first.

comment:3 Changed 20 months ago by Selva Nair

Just "discovered" #345 which looks the same except there is no user-auth in config there so the state is stuck at WAIT instead of AUTH. As suspected, this is a longstanding issue.

The logs in #345 are not useful, but the above shows the client does not process the reconnection after failed reneg as a new one and thus fails to proceed to initialization_completed.

I do not understand reneg and move_session enough to dig further: @plaisthos any thoughts?

Note: See TracTickets for help on using tickets.