Opened 7 years ago

Closed 7 years ago

#903 closed Bug / Defect (fixed)

OpenVPN client upgrade to 2.4 is unable to stay connected

Reported by: Exchizz Owned by: Gert Döring
Priority: minor Milestone: release 2.4.5
Component: Generic / unclassified Version: OpenVPN 2.4.0 (Community Ed)
Severity: Not set (select this one, unless your'e a OpenVPN developer) Keywords: regression ncp v2.3-v2.4
Cc: Steffan Karger

Description

Hi,

I upgraded my debian from wheezy to stretch yesterday. After I did the upgrade, my OpenVPN client has been unable to stay connected to my OpenVPN server.

My OpenVPN successfully establishes the connection, but then it starts writing:
Key [AF_INET]<server ip> [0] not initialized (yet), dropping packet.

I compared the example client.conf with my client-config and it turned out I was missing "cipher AES-256-CBC". After I added that, It only shows the error three times before timing out and reconnecting.

I've tried to add --ncp-disable, but it still throws errors and reconnects.

I've tried to downgrade to 2.3(from Jessie repo) with success. It's able to establish and keep the VPN tunnel open.

Logs from server and client v2{3,4} + configs are attached.

\Exchizz

Attachments (1)

logs-configs.tar.gz (47.4 KB) - added by Exchizz 7 years ago.
Logs + configs

Download all attachments as: .zip

Change History (10)

Changed 7 years ago by Exchizz

Attachment: logs-configs.tar.gz added

Logs + configs

comment:1 Changed 7 years ago by Gert Döring

Cc: Steffan Karger added

--verb 7 is slightly overdoing it, a reasonable log level for most problems is 3 or 4 - and please do not attached .tar.gz, because that means instead of "look at the log" I have to download it, extract it, return to the browser, ...

The weird thing in your client logs is this:

Fri Jun 16 17:33:53 2017 us=413390 PUSH: Received control message: 'PUSH_REPLY'

... so the server is sending back a reply to the PUSH_REQUEST, but there is *nothing* in it - which leads to "the client is not setting up its keys" (@syzzer: we might want to check whether we can init the keys on an empty PUSH_REPLY).

In the 2.3 code base, this is harmless, because then the client will "just use whatever it has been configured" - in the 2.4 code base, the configured --cipher *could* be modified by the server, so we need to wait after PUSH_REPLY to set up keys.

I guess the problem stems from here, in the server log:

Fri Jun 16 18:16:19 2017 us=172586 newcastle.mmmi-lab/10.80.12.13:40718 OPTIONS IMPORT: reading client specific options from: /etc/openvpn/clients/newcastle.mmmi-lab
Fri Jun 16 18:16:19 2017 us=172696 newcastle.mmmi-lab/10.80.12.13:40718 MULTI: no dynamic or static remote --ifconfig address is available for newcastle.mmmi-lab/10.80.12.13:40718

... and there is something like a push-reset in the newcastle.mmmi-lab file.

It might as well be due to the highly untypical server setup you are running, not
using --server or --server-bridge, which will set up things so "at least a few commands" will be pushed towards the client, but having --mode server here, with no IP pool, etc.

So maybe you actually need to put something pushed in the newcastle.mmmi-lab file,
like this:

push "echo hello"

(effectively a no-op)

comment:2 Changed 7 years ago by David Sommerseth

Keywords: regression ncp v2.3-v2.4 added; Stretch upgrade not initialized (yet) dropping packet removed
Milestone: release 2.4.0release 2.4.5
Version: git master branch2.4.0

I was the one asking for --verb 7, as we debugged this a bit on #openvpn, as I have a hunch this is related to NCP.

Notice that the v2.4 client logs lacks the "Data Channel {Decrypt,Encrypt}" lines, which confirms the crypto parameters for the data channel.

Based on the " Key [AF_INET]... [0] not initialized (yet), dropping packet" warnings, I believe the commit below is somewhat related to this issue.

commit 97894360fa537945e07fd6a85d0659e094b693a5
Author: Steffan Karger <steffan@karger.me>
Date:   Tue Jun 28 23:33:55 2016 +0200

    Add client-side support for cipher negotiation
    
    Based on the 'IV_NCP=2' mechanism described in
    http://permalink.gmane.org/gmane.network.openvpn.devel/9385.
    
    This is the first patch of a set that adds support for cipher negotiation.
    Follow-up patches will add ways to restrict or disable the mechanism, and
    add server-side support.

Also, this happens when connecting a v2.4 client to a v2.3 server. Another artefact with this configuration is that it is without IP addresses; not sure how relevant that is.

Further, it seems the issue is related to ks->crypto_options.key_ctx_bi.initialized not being set; which happens in generate_key_expansion() [ssl.c:1776].

I believe there is some code paths not being called due to the empty push message, which then "forgets" to call generate_key_expansion().

Last edited 7 years ago by David Sommerseth (previous) (diff)

comment:3 in reply to:  2 ; Changed 7 years ago by Gert Döring

Replying to dazo:

I was the one asking for --verb 7, as we debugged this a bit on #openvpn, as I have a hunch this is related to NCP.

Indeed :-) - but even then, verb 3 or 4 is sufficient.

Notice that the v2.4 client logs lacks the "Data Channel {Decrypt,Encrypt}" lines, which confirms the crypto parameters for the data channel.

Yes. In 2.4, if --client or --pull is used, crypto parameters are set up on PUSH_REPLY, because there is nothing else we can do - the crypto material can only be used once, and before PUSH_REPLY, we do not know yet whether the server wants to send us a different --cipher. So we delay initialization until we know.

Also, this happens when connecting a v2.4 client to a v2.3 server. Another artefact with this configuration is that it is without IP addresses; not sure how relevant that is.

This happens whenever a server does not send a PUSH_REPLY (which would kill OpenVPN 3 as well, as we decided that "this is mandatory part of the client setup", otherwise NCP cannot work).

Further, it seems the issue is related to ks->crypto_options.key_ctx_bi.initialized not being set; which happens in generate_key_expansion() [ssl.c:1776].

I believe there is some code paths not being called due to the empty push message, which then "forgets" to call generate_key_expansion().

The magic bits happen in tls_session_update_crypto_params(), which must only be called once.

On the client side, the call in question happens in init.c in do_deferred_options(), and that is only called from do_up() if options were seen...

bool
do_up(struct context *c, bool pulled_options, unsigned int option_types_found)
{
    if (!c->c2.do_up_ran)
    {
        reset_coarse_timers(c);

        if (pulled_options && option_types_found)
        {
            if (!do_deferred_options(c, option_types_found))
            {
                msg(D_PUSH_ERRORS, "ERROR: Failed to apply push options");
                return false;
            }
        }

empty PUSH_REPLY -> nothing set in "option_types_found" -> no do_deferred_options() -> no valid crypto settings.

(As as side note: in p2p_mode, do_up() is called right away, while in --pull mode, it's called from push.c after receiving PUSH_MSG_REPLY)

Now, everything in do_deferred_options() is checking whether the relevant option bit is set at all, except for the NCP stuff - which just checks c->options.pull, so it should be fairly safe to just remove the && option_types_found condition from do_up(). It's just a shortcut saving a few microseconds... but here, it backfires.

As a workaround to the original poster, adding --ncp-disable to the client(!) should work as well - then it should not advertise NCP, and not wait for the PUSH_REPLY. Or, just do not use --client at all if no PUSH_REPLY is ever expected, and use --tls-client instead.

comment:4 Changed 7 years ago by Steffan Karger

Seems there is no work left to be done for me here anymore - I can only concur with cron2's analysis. The configuration is a bit odd, but it should still work. The proposed fix of removing && option_types_found looks correct to me. Will you send a patch?

comment:5 in reply to:  1 Changed 7 years ago by Exchizz

Replying to cron2:

--verb 7 is slightly overdoing it, a reasonable log level for most problems is 3 or 4 - and please do not attached .tar.gz, because that means instead of "look at the log" I have to download it, extract it, return to the browser, ...

The weird thing in your client logs is this:

Fri Jun 16 17:33:53 2017 us=413390 PUSH: Received control message: 'PUSH_REPLY'

... so the server is sending back a reply to the PUSH_REQUEST, but there is *nothing* in it - which leads to "the client is not setting up its keys" (@syzzer: we might want to check whether we can init the keys on an empty PUSH_REPLY).

In the 2.3 code base, this is harmless, because then the client will "just use whatever it has been configured" - in the 2.4 code base, the configured --cipher *could* be modified by the server, so we need to wait after PUSH_REPLY to set up keys.

I guess the problem stems from here, in the server log:

Fri Jun 16 18:16:19 2017 us=172586 newcastle.mmmi-lab/10.80.12.13:40718 OPTIONS IMPORT: reading client specific options from: /etc/openvpn/clients/newcastle.mmmi-lab
Fri Jun 16 18:16:19 2017 us=172696 newcastle.mmmi-lab/10.80.12.13:40718 MULTI: no dynamic or static remote --ifconfig address is available for newcastle.mmmi-lab/10.80.12.13:40718

... and there is something like a push-reset in the newcastle.mmmi-lab file.

It might as well be due to the highly untypical server setup you are running, not
using --server or --server-bridge, which will set up things so "at least a few commands" will be pushed towards the client, but having --mode server here, with no IP pool, etc.

So maybe you actually need to put something pushed in the newcastle.mmmi-lab file,
like this:

push "echo hello"

(effectively a no-op)

Adding push "echo hello" to newcastle.mmmi-lab works with OpenVPN 2.4. The client is able to stay connected without throwing any errors at me, thanks :)

comment:6 in reply to:  3 Changed 7 years ago by Exchizz

Replying to cron2:

Replying to dazo:

I was the one asking for --verb 7, as we debugged this a bit on #openvpn, as I have a hunch this is related to NCP.

Indeed :-) - but even then, verb 3 or 4 is sufficient.

Notice that the v2.4 client logs lacks the "Data Channel {Decrypt,Encrypt}" lines, which confirms the crypto parameters for the data channel.

Yes. In 2.4, if --client or --pull is used, crypto parameters are set up on PUSH_REPLY, because there is nothing else we can do - the crypto material can only be used once, and before PUSH_REPLY, we do not know yet whether the server wants to send us a different --cipher. So we delay initialization until we know.

Also, this happens when connecting a v2.4 client to a v2.3 server. Another artefact with this configuration is that it is without IP addresses; not sure how relevant that is.

This happens whenever a server does not send a PUSH_REPLY (which would kill OpenVPN 3 as well, as we decided that "this is mandatory part of the client setup", otherwise NCP cannot work).

Further, it seems the issue is related to ks->crypto_options.key_ctx_bi.initialized not being set; which happens in generate_key_expansion() [ssl.c:1776].

I believe there is some code paths not being called due to the empty push message, which then "forgets" to call generate_key_expansion().

The magic bits happen in tls_session_update_crypto_params(), which must only be called once.

On the client side, the call in question happens in init.c in do_deferred_options(), and that is only called from do_up() if options were seen...

bool
do_up(struct context *c, bool pulled_options, unsigned int option_types_found)
{
    if (!c->c2.do_up_ran)
    {
        reset_coarse_timers(c);

        if (pulled_options && option_types_found)
        {
            if (!do_deferred_options(c, option_types_found))
            {
                msg(D_PUSH_ERRORS, "ERROR: Failed to apply push options");
                return false;
            }
        }

empty PUSH_REPLY -> nothing set in "option_types_found" -> no do_deferred_options() -> no valid crypto settings.

(As as side note: in p2p_mode, do_up() is called right away, while in --pull mode, it's called from push.c after receiving PUSH_MSG_REPLY)

Now, everything in do_deferred_options() is checking whether the relevant option bit is set at all, except for the NCP stuff - which just checks c->options.pull, so it should be fairly safe to just remove the && option_types_found condition from do_up(). It's just a shortcut saving a few microseconds... but here, it backfires.

As a workaround to the original poster, adding --ncp-disable to the client(!) should work as well - then it should not advertise NCP, and not wait for the PUSH_REPLY.

Adding --ncp-disable does not work, I get the same error.

Or, just do not use --client at all if no PUSH_REPLY is ever expected, and use --tls-client instead.

--tls-client works :> No errors and the tunnel stays open, thanks !

comment:7 in reply to:  4 Changed 7 years ago by Gert Döring

Owner: set to Gert Döring
Status: newaccepted

Replying to syzzer:

Seems there is no work left to be done for me here anymore

Isn't this nice, for a change? ;-)

  • I can only concur with cron2's analysis. The configuration is a bit odd, but it should still work. The proposed fix of removing && option_types_found looks correct to me. Will you send a patch?

I'll send a patch, tomorrow-ish...

comment:9 Changed 7 years ago by Gert Döring

Resolution: fixed
Status: acceptedclosed

Patch has been applied to the master and release/2.4 branch.

commit bd230079d98bfe6aec70b7aedefdffcdbd0e56da (master)
commit e82f7005256f77a63a3191ab7fef67e0cf0a9d02 (release/2.4)
Author: Gert Doering
Date: Sun Jun 18 11:22:44 2017 +0200

Fix edge case with clients failing to set up cipher on empty PUSH_REPLY.

so, the next release (2.4.3, due June 21st) should have it. Now I have no idea how the Debian policy on "backporting bug fixes that are not security critical" is, but I know that Samuli builds debian packages for our releases, so @Exchizz could use that one.

Note: See TracTickets for help on using tickets.