Opened 7 years ago
Closed 7 years ago
#903 closed Bug / Defect (fixed)
OpenVPN client upgrade to 2.4 is unable to stay connected
Reported by: | Exchizz | Owned by: | Gert Döring |
---|---|---|---|
Priority: | minor | Milestone: | release 2.4.5 |
Component: | Generic / unclassified | Version: | OpenVPN 2.4.0 (Community Ed) |
Severity: | Not set (select this one, unless your'e a OpenVPN developer) | Keywords: | regression ncp v2.3-v2.4 |
Cc: | Steffan Karger |
Description
Hi,
I upgraded my debian from wheezy to stretch yesterday. After I did the upgrade, my OpenVPN client has been unable to stay connected to my OpenVPN server.
My OpenVPN successfully establishes the connection, but then it starts writing:
Key [AF_INET]<server ip> [0] not initialized (yet), dropping packet.
I compared the example client.conf with my client-config and it turned out I was missing "cipher AES-256-CBC". After I added that, It only shows the error three times before timing out and reconnecting.
I've tried to add --ncp-disable, but it still throws errors and reconnects.
I've tried to downgrade to 2.3(from Jessie repo) with success. It's able to establish and keep the VPN tunnel open.
Logs from server and client v2{3,4} + configs are attached.
\Exchizz
Attachments (1)
Change History (10)
Changed 7 years ago by
Attachment: | logs-configs.tar.gz added |
---|
comment:1 follow-up: 5 Changed 7 years ago by
Cc: | Steffan Karger added |
---|
--verb 7
is slightly overdoing it, a reasonable log level for most problems is 3 or 4 - and please do not attached .tar.gz, because that means instead of "look at the log" I have to download it, extract it, return to the browser, ...
The weird thing in your client logs is this:
Fri Jun 16 17:33:53 2017 us=413390 PUSH: Received control message: 'PUSH_REPLY'
... so the server is sending back a reply to the PUSH_REQUEST, but there is *nothing* in it - which leads to "the client is not setting up its keys" (@syzzer: we might want to check whether we can init the keys on an empty PUSH_REPLY).
In the 2.3 code base, this is harmless, because then the client will "just use whatever it has been configured" - in the 2.4 code base, the configured --cipher
*could* be modified by the server, so we need to wait after PUSH_REPLY to set up keys.
I guess the problem stems from here, in the server log:
Fri Jun 16 18:16:19 2017 us=172586 newcastle.mmmi-lab/10.80.12.13:40718 OPTIONS IMPORT: reading client specific options from: /etc/openvpn/clients/newcastle.mmmi-lab Fri Jun 16 18:16:19 2017 us=172696 newcastle.mmmi-lab/10.80.12.13:40718 MULTI: no dynamic or static remote --ifconfig address is available for newcastle.mmmi-lab/10.80.12.13:40718
... and there is something like a push-reset
in the newcastle.mmmi-lab file.
It might as well be due to the highly untypical server setup you are running, not
using --server
or --server-bridge
, which will set up things so "at least a few commands" will be pushed towards the client, but having --mode server
here, with no IP pool, etc.
So maybe you actually need to put something pushed in the newcastle.mmmi-lab file,
like this:
push "echo hello"
(effectively a no-op)
comment:2 follow-up: 3 Changed 7 years ago by
Keywords: | regression ncp v2.3-v2.4 added; Stretch upgrade not initialized (yet) dropping packet removed |
---|---|
Milestone: | release 2.4.0 → release 2.4.5 |
Version: | git master branch → 2.4.0 |
I was the one asking for --verb 7
, as we debugged this a bit on #openvpn, as I have a hunch this is related to NCP.
Notice that the v2.4 client logs lacks the "Data Channel {Decrypt,Encrypt}" lines, which confirms the crypto parameters for the data channel.
Based on the " Key [AF_INET]... [0] not initialized (yet), dropping packet" warnings, I believe the commit below is somewhat related to this issue.
commit 97894360fa537945e07fd6a85d0659e094b693a5 Author: Steffan Karger <steffan@karger.me> Date: Tue Jun 28 23:33:55 2016 +0200 Add client-side support for cipher negotiation Based on the 'IV_NCP=2' mechanism described in http://permalink.gmane.org/gmane.network.openvpn.devel/9385. This is the first patch of a set that adds support for cipher negotiation. Follow-up patches will add ways to restrict or disable the mechanism, and add server-side support.
Also, this happens when connecting a v2.4 client to a v2.3 server. Another artefact with this configuration is that it is without IP addresses; not sure how relevant that is.
Further, it seems the issue is related to ks->crypto_options.key_ctx_bi.initialized
not being set; which happens in generate_key_expansion()
[ssl.c:1776].
I believe there is some code paths not being called due to the empty push message, which then "forgets" to call generate_key_expansion()
.
comment:3 follow-up: 6 Changed 7 years ago by
Replying to dazo:
I was the one asking for
--verb 7
, as we debugged this a bit on #openvpn, as I have a hunch this is related to NCP.
Indeed :-) - but even then, verb 3 or 4 is sufficient.
Notice that the v2.4 client logs lacks the "Data Channel {Decrypt,Encrypt}" lines, which confirms the crypto parameters for the data channel.
Yes. In 2.4, if --client
or --pull
is used, crypto parameters are set up on PUSH_REPLY, because there is nothing else we can do - the crypto material can only be used once, and before PUSH_REPLY, we do not know yet whether the server wants to send us a different --cipher
. So we delay initialization until we know.
Also, this happens when connecting a v2.4 client to a v2.3 server. Another artefact with this configuration is that it is without IP addresses; not sure how relevant that is.
This happens whenever a server does not send a PUSH_REPLY (which would kill OpenVPN 3 as well, as we decided that "this is mandatory part of the client setup", otherwise NCP cannot work).
Further, it seems the issue is related to
ks->crypto_options.key_ctx_bi.initialized
not being set; which happens ingenerate_key_expansion()
[ssl.c:1776].
I believe there is some code paths not being called due to the empty push message, which then "forgets" to call
generate_key_expansion()
.
The magic bits happen in tls_session_update_crypto_params()
, which must only be called once.
On the client side, the call in question happens in init.c in do_deferred_options()
, and that is only called from do_up()
if options were seen...
bool do_up(struct context *c, bool pulled_options, unsigned int option_types_found) { if (!c->c2.do_up_ran) { reset_coarse_timers(c); if (pulled_options && option_types_found) { if (!do_deferred_options(c, option_types_found)) { msg(D_PUSH_ERRORS, "ERROR: Failed to apply push options"); return false; } }
empty PUSH_REPLY -> nothing set in "option_types_found" -> no do_deferred_options() -> no valid crypto settings.
(As as side note: in p2p_mode, do_up() is called right away, while in --pull mode, it's called from push.c after receiving PUSH_MSG_REPLY)
Now, everything in do_deferred_options()
is checking whether the relevant option bit is set at all, except for the NCP stuff - which just checks c->options.pull
, so it should be fairly safe to just remove the && option_types_found
condition from do_up()
. It's just a shortcut saving a few microseconds... but here, it backfires.
As a workaround to the original poster, adding --ncp-disable
to the client(!) should work as well - then it should not advertise NCP, and not wait for the PUSH_REPLY. Or, just do not use --client
at all if no PUSH_REPLY is ever expected, and use --tls-client
instead.
comment:4 follow-up: 7 Changed 7 years ago by
Seems there is no work left to be done for me here anymore - I can only concur with cron2's analysis. The configuration is a bit odd, but it should still work. The proposed fix of removing && option_types_found
looks correct to me. Will you send a patch?
comment:5 Changed 7 years ago by
Replying to cron2:
--verb 7
is slightly overdoing it, a reasonable log level for most problems is 3 or 4 - and please do not attached .tar.gz, because that means instead of "look at the log" I have to download it, extract it, return to the browser, ...
The weird thing in your client logs is this:
Fri Jun 16 17:33:53 2017 us=413390 PUSH: Received control message: 'PUSH_REPLY'
... so the server is sending back a reply to the PUSH_REQUEST, but there is *nothing* in it - which leads to "the client is not setting up its keys" (@syzzer: we might want to check whether we can init the keys on an empty PUSH_REPLY).
In the 2.3 code base, this is harmless, because then the client will "just use whatever it has been configured" - in the 2.4 code base, the configured
--cipher
*could* be modified by the server, so we need to wait after PUSH_REPLY to set up keys.
I guess the problem stems from here, in the server log:
Fri Jun 16 18:16:19 2017 us=172586 newcastle.mmmi-lab/10.80.12.13:40718 OPTIONS IMPORT: reading client specific options from: /etc/openvpn/clients/newcastle.mmmi-lab Fri Jun 16 18:16:19 2017 us=172696 newcastle.mmmi-lab/10.80.12.13:40718 MULTI: no dynamic or static remote --ifconfig address is available for newcastle.mmmi-lab/10.80.12.13:40718... and there is something like a
push-reset
in the newcastle.mmmi-lab file.
It might as well be due to the highly untypical server setup you are running, not
using--server
or--server-bridge
, which will set up things so "at least a few commands" will be pushed towards the client, but having--mode server
here, with no IP pool, etc.
So maybe you actually need to put something pushed in the newcastle.mmmi-lab file,
like this:
push "echo hello"(effectively a no-op)
Adding push "echo hello" to newcastle.mmmi-lab works with OpenVPN 2.4. The client is able to stay connected without throwing any errors at me, thanks :)
comment:6 Changed 7 years ago by
Replying to cron2:
Replying to dazo:
I was the one asking for
--verb 7
, as we debugged this a bit on #openvpn, as I have a hunch this is related to NCP.
Indeed :-) - but even then, verb 3 or 4 is sufficient.
Notice that the v2.4 client logs lacks the "Data Channel {Decrypt,Encrypt}" lines, which confirms the crypto parameters for the data channel.
Yes. In 2.4, if
--client
or--pull
is used, crypto parameters are set up on PUSH_REPLY, because there is nothing else we can do - the crypto material can only be used once, and before PUSH_REPLY, we do not know yet whether the server wants to send us a different--cipher
. So we delay initialization until we know.
Also, this happens when connecting a v2.4 client to a v2.3 server. Another artefact with this configuration is that it is without IP addresses; not sure how relevant that is.
This happens whenever a server does not send a PUSH_REPLY (which would kill OpenVPN 3 as well, as we decided that "this is mandatory part of the client setup", otherwise NCP cannot work).
Further, it seems the issue is related to
ks->crypto_options.key_ctx_bi.initialized
not being set; which happens ingenerate_key_expansion()
[ssl.c:1776].
I believe there is some code paths not being called due to the empty push message, which then "forgets" to call
generate_key_expansion()
.
The magic bits happen in
tls_session_update_crypto_params()
, which must only be called once.
On the client side, the call in question happens in init.c in
do_deferred_options()
, and that is only called fromdo_up()
if options were seen...
bool do_up(struct context *c, bool pulled_options, unsigned int option_types_found) { if (!c->c2.do_up_ran) { reset_coarse_timers(c); if (pulled_options && option_types_found) { if (!do_deferred_options(c, option_types_found)) { msg(D_PUSH_ERRORS, "ERROR: Failed to apply push options"); return false; } }empty PUSH_REPLY -> nothing set in "option_types_found" -> no do_deferred_options() -> no valid crypto settings.
(As as side note: in p2p_mode, do_up() is called right away, while in --pull mode, it's called from push.c after receiving PUSH_MSG_REPLY)
Now, everything in
do_deferred_options()
is checking whether the relevant option bit is set at all, except for the NCP stuff - which just checksc->options.pull
, so it should be fairly safe to just remove the&& option_types_found
condition fromdo_up()
. It's just a shortcut saving a few microseconds... but here, it backfires.
As a workaround to the original poster, adding
--ncp-disable
to the client(!) should work as well - then it should not advertise NCP, and not wait for the PUSH_REPLY.
Adding --ncp-disable does not work, I get the same error.
Or, just do not use
--client
at all if no PUSH_REPLY is ever expected, and use--tls-client
instead.
--tls-client works :> No errors and the tunnel stays open, thanks !
comment:7 Changed 7 years ago by
Owner: | set to Gert Döring |
---|---|
Status: | new → accepted |
Replying to syzzer:
Seems there is no work left to be done for me here anymore
Isn't this nice, for a change? ;-)
- I can only concur with cron2's analysis. The configuration is a bit odd, but it should still work. The proposed fix of removing
&& option_types_found
looks correct to me. Will you send a patch?
I'll send a patch, tomorrow-ish...
comment:8 Changed 7 years ago by
Patch has been sent to the list...
https://www.mail-archive.com/openvpn-devel@lists.sourceforge.net/msg14856.html
comment:9 Changed 7 years ago by
Resolution: | → fixed |
---|---|
Status: | accepted → closed |
Patch has been applied to the master and release/2.4 branch.
commit bd230079d98bfe6aec70b7aedefdffcdbd0e56da (master)
commit e82f7005256f77a63a3191ab7fef67e0cf0a9d02 (release/2.4)
Author: Gert Doering
Date: Sun Jun 18 11:22:44 2017 +0200
Fix edge case with clients failing to set up cipher on empty PUSH_REPLY.
so, the next release (2.4.3, due June 21st) should have it. Now I have no idea how the Debian policy on "backporting bug fixes that are not security critical" is, but I know that Samuli builds debian packages for our releases, so @Exchizz could use that one.
Logs + configs