Opened 7 years ago
Closed 7 years ago
#879 closed Bug / Defect (fixed)
Two issues when reconnecting with 2.4.1
Reported by: | sbehrens | Owned by: | Steffan Karger |
---|---|---|---|
Priority: | major | Milestone: | |
Component: | Generic / unclassified | Version: | OpenVPN 2.4.1 (Community Ed) |
Severity: | Not set (select this one, unless your'e a OpenVPN developer) | Keywords: | extra_frame MTU cipher reconnect NCP |
Cc: | Steffan Karger |
Description (last modified by )
Theses two issues happen only sometimes. The full server log is attached. Both issues might be related?
Issue # 1:
extra_frame is -72 larger after each connection attempt and after a few times packets are discarded with "TCP/UDP packet too large on write".
Some of the lines of the server log file:
MULTI: multi_create_instance called Data Channel MTU parms [ L:1621 D:1450 EF:121 EB:406 ET:0 EL:3 ] TLS: Username/Password authentication succeeded for username 'foo' [CN SET] Data Channel MTU parms [ L:1549 D:1450 EF:49 EB:406 ET:0 EL:3 ] TLS: Username/Password authentication succeeded for username 'foo' [CN SET] Data Channel MTU parms [ L:1477 D:1450 EF:-23 EB:406 ET:0 EL:3 ] TLS: Username/Password authentication succeeded for username 'foo' [CN SET] Data Channel MTU parms [ L:1405 D:1405 EF:-95 EB:406 ET:0 EL:3 ] TLS: Username/Password authentication succeeded for username 'foo' [CN SET] Data Channel MTU parms [ L:1333 D:1333 EF:-167 EB:406 ET:0 EL:3 ] TCP/UDP packet too large on write to [AF_INET]1.2.3.4:1194 (tried=1419,max=1333)
(A small ping still works.)
Issue # 2:
Sometimes the PUSH_REPLY does not include the "cipher AES-256-GCM" which would switch the client away from the default BF-CBC to AES-256-GCM. Client and server use mismatching ciphers afterwards.
When tls_session_generate_data_channel_keys() is called via tls_session_update_crypto_params() (which also adds the -72 bytes to extra_frame and dumps the "Data Channel MTU parms") the client and server both use the same cipher AES-256-GCM, but sometimes key_method_2_write() or key_method_2_read() seem to be the one who call tls_session_generate_data_channel_keys() and then the server uses AES-256-GCM but the client uses BF-CBC.
Example for when client + server both establish AES-256-GCM:
TLS: Username/Password authentication succeeded for username 'foo' [CN SET] TLS: move_session: dest=TM_ACTIVE src=TM_UNTRUSTED reinit_src=1 TLS: tls_multi_process: untrusted session promoted to semi-trusted Control Channel: TLSv1.2, cipher TLSv1/SSLv3 ECDHE-RSA-AES256-GCM-SHA384 PUSH: Received control message: 'PUSH_REQUEST' SENT CONTROL [foo]: 'PUSH_REPLY,ping 45,route 3.2.1.69,route 3.2.1.72,route 3.2.1.44,route 10.144.0.1,topology net30,ifconfig 10.144.0.6 10.144.0.5,peer-id 0,cipher AES-256-GCM' (status=1) Data Channel MTU parms [ L:1333 D:1333 EF:-167 EB:406 ET:0 EL:3 ] Data Channel Encrypt: Cipher 'AES-256-GCM' initialized with 256 bit key Data Channel Decrypt: Cipher 'AES-256-GCM' initialized with 256 bit key
Example for when client uses BF-CBC but server uses AES-256-GCM:
TLS: Username/Password authentication succeeded for username 'foo' [CN SET] Data Channel Encrypt: Cipher 'AES-256-GCM' initialized with 256 bit key Data Channel Decrypt: Cipher 'AES-256-GCM' initialized with 256 bit key TLS: move_session: dest=TM_ACTIVE src=TM_UNTRUSTED reinit_src=1 TLS: tls_multi_process: untrusted session promoted to semi-trusted Control Channel: TLSv1.2, cipher TLSv1/SSLv3 ECDHE-RSA-AES256-GCM-SHA384 PUSH: Received control message: 'PUSH_REQUEST' PUSH: client wants to negotiate cipher (NCP), but server has already generated data channel keys, ignoring client request SENT CONTROL [foo]: 'PUSH_REPLY,ping 45,route 3.2.1.69,route 3.2.1.72,route 3.2.1.44,route 10.144.0.1,topology net30,ifconfig 10.144.0.6 10.144.0.5,peer-id 0' (status=1) AEAD Decrypt error: cipher final failed
Attachments (2)
Change History (17)
Changed 7 years ago by
Attachment: | openvpn-log.txt added |
---|
comment:1 Changed 7 years ago by
Cc: | Steffan Karger added |
---|
@syzzer: this seems to be your ballpark...
@sbehrens: are you doing anything unusal on the server, like, any non-standard options, scripts, plugins?
comment:2 Changed 7 years ago by
A script to verify the user/password is allowed to log in. The config looks normal to me. This is the config (with some lines removed which shouldn't matter):
server 10.144.0.0 255.255.0.0
float
duplicate-cn
script-security 3
auth-user-pass-verify auth.pl via-env
up up.sh
verify-client-cert none
username-as-common-name
ca ca.drive....com.crt
cert server.crt
key server.key
dh dh2048.pem
tls-auth tls-auth.key
comment:3 Changed 7 years ago by
Sorry, after taking a closer look, these two things don't look related anymore. Shall I split this single ticket into two?
The "TCP/UDP packet too large on write" is reproducible just by reconnecting. extra_flags grows by -72 with each new connection and SMB fails to operate after three reconnects.
The second issue looks like being related to the "TLS: soft reset sec=0 bytes=23248/-1 pkts=237/0" thing, when the state is reset. I'm currently running the server with verb=9 and reneg-sec 180 and in fact after the reset, disconnect and reconnect the BF-CBC on the client happens.
comment:4 Changed 7 years ago by
I have hit the same issue with the shrinking MTU sizes. I've sent a patch for it to the mailing list (available also at https://sourceforge.net/p/openvpn/mailman/message/35823479/) and also opened a pull request on GitHub? for discussion about the patch: https://github.com/OpenVPN/openvpn/pull/87
comment:5 Changed 7 years ago by
Owner: | set to Steffan Karger |
---|---|
Status: | new → accepted |
Thanks for the clear bug report and the suggested patch. And sorry for taking long to respond, life has been a bit busy...
The analysis makes sense, and logic-wise the resulting patch does too. I'll look into this further soon.
In the mean while, sbehrens, it would be great to hear from you if this patch works properly for you too.
comment:6 Changed 7 years ago by
This patch does work properly for me and fixes the 1st issue. The client is able to reconnect multiple times. Grepping for 'Data Channel MTU parms' on the server shows "[ L:1549 D:1450 EF:49 EB:406 ET:0 EL:3 ]" all the time. The extra_flags no longer change on a reconnect.
Thanks!
comment:7 Changed 7 years ago by
Description: | modified (diff) |
---|
(Tuned the markup of the description a little.)
comment:8 Changed 7 years ago by
It look me a while to reproduce the issue, because I have nobind
in all my client configs. That causes the client to reconnect from a different port after a SIGUSR1 or process restart, and causes the server to initialize a fresh state instead of trying to reuse the old state. So as a quick workaround for those running 2.4.0 or 2.4.1 on the server: add nobind
to the client configs.
As for the fix - I don't think the approach is correct. Because a client might actually reconnect after a config file change, and we *do* need to recompute the MTU in that case. So the right fix would rather be to restore the pre-adjustment MTU when a client reconnects. Code-wise, I still need to figure out how to properly do that.
Changed 7 years ago by
Attachment: | 0001-XXX-Restore-pre-NCP-frame-parameters-for-new-session.patch added |
---|
comment:9 Changed 7 years ago by
Attached a proof-of-concept patch that restores the pre-NCP frame parameters for new sessions. This resolves the issue in my tests. Could you please test if this works for you too?
comment:10 follow-up: 12 Changed 7 years ago by
Looks like this also solves the problem for me. I do have a question about the patch, but that is mostly due to not being very familiar with the OpenVPN internals. Would overwriting the frame pointer leak that object on connection reuse or are there other references to it that make sure it can be cleaned up?
comment:11 Changed 7 years ago by
0001-XXX-Restore-pre-NCP-frame-parameters-for-new-session.patch fixes the 1st issue for me. OpenVPN clients are able to reconnect multiple times.
comment:12 Changed 7 years ago by
Thanks both for testing and reporting back!
Replying to dbussink:
I do have a question about the patch, but that is mostly due to not being very familiar with the OpenVPN internals. Would overwriting the frame pointer leak that object on connection reuse or are there other references to it that make sure it can be cleaned up?
It's not a pointer, I'm overwriting the whole struct frame (which has some numbers in it to keep track of the maximum packet sized at the various places in the OpenVPN codebase).
The patch is sent to the list for inclusion in 2.4.2.
comment:13 follow-up: 14 Changed 7 years ago by
Oh, sbehrens, could you create a separate ticket for the other issue? That makes it a bit easier to track that issue.
comment:14 Changed 7 years ago by
Replying to syzzer:
Oh, sbehrens, could you create a separate ticket for the other issue? That makes it a bit easier to track that issue.
ticket:887 is the new one for the second issue.
The first issue is resolved for me. Thanks!
comment:15 Changed 7 years ago by
Resolution: | → fixed |
---|---|
Status: | accepted → closed |
Thanks!
Closing this ticket; the patch has been applied to the following branches:
commit 9900e023bcc49964d33e6f22c2b6223f8932acf8 (master)
commit 03d01f4f69cfc6768343b9f0f2dde2049e4882d2 (release/2.4)
Author: Steffan Karger
Date: Mon May 8 23:44:38 2017 +0200
Restore pre-NCP frame parameters for new sessions
server log, kill -USR2 output, OS and --version output