Opened 7 years ago
Last modified 7 years ago
#847 new Bug / Defect
Long lived crypto "state" client lock-up
Reported by: | David Sommerseth | Owned by: | Steffan Karger |
---|---|---|---|
Priority: | major | Milestone: | |
Component: | Crypto | Version: | OpenVPN 2.3.10 (Community Ed) |
Severity: | Not set (select this one, unless your'e a OpenVPN developer) | Keywords: | |
Cc: | tct, Gert Döring |
Description
Configuration setup
Client
[root@localhost ~]# openvpn --client --remote 192.168.122.1 --ca sample/sample-keys/ca.crt --key sample/sample-keys/client.key --cert sample/sample-keys/client.crt --verb 4 --dev tun --auth-user-pass --explicit-exit-notify --auth-nocache
Server
[root@localhost ~]# openvpn --dev tun --ca sample/sample-keys/ca.crt --key sample/sample-keys/server.key --cert sample/sample-keys/server.crt --dh sample/sample-keys/dh2048.pem --server 10.8.0.0 255.255.255.0 --script-security 3 --auth-user-pass-verify .auth.sh via-env --verb 7 --reneg-sec 10
What happens
The client asks for username/password, connects and running ping on the client against 10.8.0.1 works as expected.
A renegotiation is forced (due to --reneg-sec 10
). At this point do not enter any credentials until the server have logged:
Sat Feb 25 00:34:55 2017 us=576093 Test-Client/192.168.122.9:1194 TLS Error: TLS key negotiation failed to occur within 60 seconds (check your network connectivity) Sat Feb 25 00:34:55 2017 us=576099 Test-Client/192.168.122.9:1194 TLS Error: TLS handshake failed Sat Feb 25 00:34:55 2017 us=576104 Test-Client/192.168.122.9:1194 TLS: move_session: dest=TM_LAME_DUCK src=TM_ACTIVE reinit_src=1
When this stage is completed, enter the proper username/password credentials. The client will re-connect properly (might need a couple of attempts, due to some client time-outs). But from this point of the server will no longer accept packets on the DATA channel from the client. Right before the rejection happens, this can be seen in the server log:
Sat Feb 25 00:35:06 2017 us=972360 Test-Client/192.168.122.9:1194 Data Channel Encrypt: Cipher 'AES-256-GCM' initialized with 256 bit key Sat Feb 25 00:35:06 2017 us=972379 Test-Client/192.168.122.9:1194 Data Channel Encrypt: CIPHER KEY: 3d4f9143 557d88ff 51cf05f7 4052abd8 93126d2e f2c1826c 1a7888a2 ef3cceb8 Sat Feb 25 00:35:06 2017 us=972401 Test-Client/192.168.122.9:1194 Data Channel Encrypt: CIPHER block_size=16 iv_size=12 Sat Feb 25 00:35:06 2017 us=972424 Test-Client/192.168.122.9:1194 Data Channel Decrypt: Cipher 'AES-256-GCM' initialized with 256 bit key Sat Feb 25 00:35:06 2017 us=972449 Test-Client/192.168.122.9:1194 Data Channel Decrypt: CIPHER KEY: e8c6b960 88e078da ce001f18 8f465bb0 bc82adfa 63118cd3 7d29d6b1 70273ae4 Sat Feb 25 00:35:06 2017 us=972465 Test-Client/192.168.122.9:1194 Data Channel Decrypt: CIPHER block_size=16 iv_size=12 Sat Feb 25 00:35:06 2017 us=972540 Test-Client/192.168.122.9:1194 UDPv4 WRITE [247] to [AF_INET]192.168.122.9:1194: P_CONTROL_V1 kid=0 [ 5 ] pid=6 DATA len=221
And after that, the following log lines appears for each packet the client tries to send over the VPN tunnel
Sat Feb 25 00:35:06 2017 us=972653 Test-Client/192.168.122.9:1194 AEAD Decrypt error: cipher final failed
The bad thing about this is that this is persistent if the client stops its openvpn process (even with --explicit-exit-notify
) and starts a completely fresh session. Restarting the server side re-enables the client again.
Change History (8)
comment:1 Changed 7 years ago by
Cc: | tct added |
---|
comment:2 Changed 7 years ago by
comment:3 Changed 7 years ago by
Priority: | blocker → major |
---|---|
Summary: | Long lived AEAD "state" client lock-up → Long lived crypto "state" client lock-up |
Version: | git master branch → 2.3.10 |
Just managed to reproduce, and this is not NCP-related. The same happens when running 2.3.10 server vs 2.3.10 client, it just reports Authenticate/Decrypt packet error: packet HMAC authentication failed
instead of AEAD Decrypt error: cipher final failed
.
So this seems to be a long-standing thing, which requires diving deep into the openvpn connection setup logic, and not something new that lots people will run into. I'm reducing the priority - there's more pressing stuff to dive into first.
comment:4 Changed 7 years ago by
Cc: | Gert Döring added |
---|
comment:5 Changed 7 years ago by
Just ran into this issue as well - OpenVPN had been up for 94 days, with a small number of clients (very low churn rate - i.e. teardown/setup every 3-4 days) - suddenly logging on the server:
" ... 1194 AEAD Decrypt error: cipher final failed"
As above - restarting client didn't help, only restarting the server...
So guess that's a "/me too"
-Karl
comment:6 Changed 7 years ago by
I have the same problem since i upgraded Centos 7.2 to Centos 7.3 and OpenVPN 2.3.10 to OpenVPN 2.4.1. This problem pops up randomly on random clients. After server reboot it's ok for some time (few hours?). When problem occurs, user get error: "Authenticate/Decrypt? packet error: packet HMAC authentication failed" and on server side there is error: "AEAD Decrypt error: cipher final failed". I am going to downgrade back to OpenVPN 2.3.10 for now.
comment:8 Changed 7 years ago by
There might be two different issue here.
One is the NCP bug on TLS reconnects, fixed today in master and release/2.4 (only relevant on 2.4<->2.4 connections, and when this hits you see a comment in the server log about "server has already generated data channel keys"), see trac #887. This one is only relevant on 2.4.0/2.4.1/2.4.2 servers in combinations with clients that bind to a fixed UDP port (= not using nobind
) - and fixed in 2.4.3. Workaround is nobind
(on the client) or ncp-disable
(either side).
The other one is much more rare, and since it can happen 2.3<->2.3, downgrading isn't going to help with that one.
This looks NCP-related. Can you test if this still happens if (at least) one of the sides specifies
--ncp-disable
?I've tried reproducing the behaviour, but didn't manage to get "AEAD Decrypt error: cipher final failed", I did manage to get "Key [AF_INET]10.1.1.2:36499 [0] not initialized (yet), dropping packet." though. One reconnect later the connection recovered in my test.
I'll investigate further.