Opened 7 years ago

Closed 7 years ago

Last modified 7 years ago

#827 closed Patch submission (fixed)

openvpn-client@.service fails permanently, if there are DNS issues

Reported by: dvzrv Owned by: David Sommerseth
Priority: major Milestone: release 2.4.1
Component: Generic / unclassified Version: OpenVPN 2.4.0 (Community Ed)
Severity: Not set (select this one, unless your'e a OpenVPN developer) Keywords: systemd
Cc: eworm

Description

As noted downstream in the Arch Linux bug tracker (https://bugs.archlinux.org/task/52654), the openvpn-client@.service fails permanently, if there are DNS issues during startup (which could happen, if you move between networks with the client or stay offline in the beginning).

This would render the tunnel useless and force unprivilegded users to reboot to get a connection in a new (working) network.

As openvpn-client@.service is supposed to be a long-running service (that "just works"), it should be restarted as often as possible.
"Restart=on-abnormal" comes to the rescue here (https://www.freedesktop.org/software/systemd/man/systemd.service.html#Restart)

Steps to reproduce:

  1. Configure and enable working openvpn client connection (relying on domain name resolution for the server in use) on system.
  2. Start computer in a network with misconfigured DNS.
  3. Wait until the service permanently fails.
  4. Change to network with working DNS (no reconnect attempts).

Attachments (4)

systemd-restart.patch (314 bytes) - added by dvzrv 7 years ago.
restart patch for openvpn-client@.service
openvpn-client-without_systemd.log (15.7 KB) - added by dvzrv 7 years ago.
log of openvpn client without systemd (no DNS)
openvpn-client-with_systemd.log (25.9 KB) - added by dvzrv 7 years ago.
log of openvpn client with systemd (no DNS)
openvpn-client-with_systemd-status.log (1.6 KB) - added by dvzrv 7 years ago.
systemctl status of openvpn client after failing (no DNS)

Download all attachments as: .zip

Change History (14)

Changed 7 years ago by dvzrv

Attachment: systemd-restart.patch added

restart patch for openvpn-client@.service

comment:1 Changed 7 years ago by David Sommerseth

I have no doubt that this approach will work; it is probably an approach we should really consider to apply as well (at least in openvpn-server@).

But this currently feels more like a workaround than a proper fix to this issue. I am not convinced an OpenVPN client should exit if it can't resolve DNS queries. The man page states:

       --resolv-retry n
              If hostname resolve fails for --remote, retry
              resolve for n seconds before failing.

              Set n to "infinite" to retry indefinitely.

              By   default,   --resolv-retry   infinite  is
              enabled.  You can disable by setting n=0.

It would be valuable to see configurations where this fails together with log files with --verb 4. This will help us see how this can be improved. It would also be valuable to know if this issue is also present if the OpenVPN client is started from the command line directly, or only if it is via systemd.

comment:2 Changed 7 years ago by dvzrv

"resolv-retry infinite" was set in the configuration used in the test case mentioned above.

I'm suspecting this will be a systemd issue, but I can try and provide information for both cases in a bit!
Just at work now, so give me a couple of hours.

Changed 7 years ago by dvzrv

log of openvpn client without systemd (no DNS)

Changed 7 years ago by dvzrv

log of openvpn client with systemd (no DNS)

Changed 7 years ago by dvzrv

systemctl status of openvpn client after failing (no DNS)

comment:3 Changed 7 years ago by dvzrv

Okay, I think I found out, why things are behaving the way they are behaving (and as I suspected, they only behave this way for systemd).

First off: openvpn-{client,server}@.service are of Type=notify (https://www.freedesktop.org/software/systemd/man/systemd.service.html#Type=), which means, that they are supposed to return a signal, once they are all setup.
All services, that are not starting within a certain amount of time will be stopped and marked as failed by timeout, which they are in this particular case, as they run longer than the given default 90s timeout defined by TimeoutStartSec? (https://www.freedesktop.org/software/systemd/man/systemd.service.html#TimeoutStartSec=) and/or DefaultTimeoutStartSec? (https://www.freedesktop.org/software/systemd/man/systemd-system.conf.html#DefaultTimeoutStartSec=).

openvpn-{client,server}@.service will not be able to send the signal needed for Type=notify to be successful in a case where it can not resolve the hostname or doesn't have network access and thus the service fails permanently.

This can be fixed in two ways:

  1. Make openvpn-{client,server}@.service "Type=simple" (not so smart)
  2. Proceed as previously mentioned by adding "Restart=on-abnormal" to openvpn-{client,server}@.service

comment:4 Changed 7 years ago by David Sommerseth

Cc: eworm added
Keywords: service removed

Right, this makes sense, but I disagree to your conclusion. Using Restart=on-abnormal in this case is a workaround. And we will not revert back to Type=simple either, even though that does work too.

We need to look into moving the sd_notify() call earlier than when OpenVPN completes the initialization, which for the client side means after a successful connection. We also have a somewhat related issue when using p2p with static keys as well.

I'm adding eworm on Cc to this ticket, who have contributed with a few patches already improving the systemd integration.

comment:5 Changed 7 years ago by dvzrv

@dazo: Well, I'd much rather have a workaround than not be able to connect in certain circumstances ;-)
Restart=on-abnormal at least assures that the process will be restarted as soon as it hits the timeout. This is probably also a good idea, if something else goes wrong.

I'd like you to consider the following case though:
Given, that you start a unit as a dependency of openvpn-client@.service, would your proposed solution not mean, that openvpn-client@.service would be marked as successfully started up (using READY=1 with sd_notify(), although not connected)?
In that case any unit depending on it would start after that on false grounds assuming that its dependencies are met and probably fail.
I'm not sure that sd_notify() is flexible enough to handle this case, as it is dependant on TimeoutStartSec?.

So, another possible thing to do would be to set TimeoutStartSec?=infinity, which gets around the timeout issue altogether and allows openvpn to function as usual.
Would there be any reason not to do it (in my head it currently makes a lot of sense)?
I can test this in a couple of hours and provide some log output.

comment:6 Changed 7 years ago by David Sommerseth

Milestone: release 2.4.1
Owner: set to David Sommerseth
Status: newaccepted

I've just submitted ato the mailing list which should resolve this issue.

If you have a chance, please test this new patch and report back

https://sourceforge.net/p/openvpn/mailman/message/35624370/
Message-Id: <20170124232344.7825-1-davids@…>

Author: David Sommerseth
Date:   Tue Jan 24 23:16:24 2017 +0100

    systemd: Move the READY=1 signalling to an earlier point
    
    Currently, OpenVPN will first tell systemd it is ready once the
    log will be appended with "Initialization Sequence Completed".
    This turns out to cause some issues several places.
    
    First, it adds challenges if --chroot is used in the configuration;
    this we already fixed.  Secondly, it will cause havoc on static key
    p2p mode configurations where the log line above will not happen
    before either sides have completed establishing a connection.  And
    thirdly, if a client configuration fails to establish a connection
    within 90 seconds it will also fail.  For the third case this may
    not need to be a critical issue, as the host just needs to get
    an Internet access established first - which in some scenarios may
    take much longer than those 90 seconds after the OpenVPN client
    configuration tries to start.
    
    The approach this patch takes is to consider OpenVPN ready once
    all the initial preparations and configurations have happened - but
    before a connection to a remote side have been attempted.  This
    also removes the need for specially handling the --chroot scenario.
    
    The final "Initialization Sequence Completed" message update is
    kept (though slightly simplified) to indicate we're in a good
    state - even though that update will still not be visible
    if --chroot is used (as before this patch).
    
    Trac: #827, #801
    Signed-off-by: David Sommerseth

comment:7 Changed 7 years ago by David Sommerseth

And to answer some questions:

Q: Given, that you start a unit as a dependency of openvpn-client@.service, would your proposed solution not mean, that openvpn-client@.service would be marked as successfully started up (using READY=1 with sd_notify(), although not connected)?

A: Yes. IMO, it should not primarily be systemds task to consider if a service is running based on if it got a connection to a remote site or not. If a connection is established or not, how long to wait for until aborting, etc, should primarily be OpenVPNs task to evaluate - as OpenVPN knows how such connections should work, while systemd does not. So let systemd manage the overall process without caring what the process does internally, and let the OpenVPN process take care of its own business.

Q: In that case any unit depending on it would start after that on false grounds assuming that its dependencies are met and probably fail.

A: That's a fair point. And yes, this can make unit dependencies fail, if it is believed that systemd manages the VPN connection itself. Currently there are no good approaches to that, but I believe that should be managed outside of the openvpn-{client,server}@.service unit files. We could introduce another set of unit files (unless NetworkManager? already got that?) which can carry such a feature; like having a service which can poll a running OpenVPN configuration if the tunnel is established or not.

This has the advantage of separating OpenVPN process management from OpenVPN connection management. Because: A running OpenVPN process is no guarantee that you have a valid VPN connection. If an OpenVPN process is not reporting "READY=1" back to the systemd is not any indication at all that the process is having issues, it might just need much more time than systemd thinks it should have. And since VPNs are dependent on a functional Internet, which have billions (if not trillions and more) different scenarios where it works somehow, we can't have a realistic timeout value which will satisfy most users. Well, we could add 30 minutes, or even 10 minutes - but if such an issue hits during a boot on a critical server and you have limited possibilities to understand why the boot is still lagging after 8 minutes, that would not be a really good experience.

So: Keep process management and connectivity management as two separate issues, and handle dependencies for these separate areas independently. Their use cases are quite different.

Q: So, another possible thing to do would be to set TimeoutStartSec??=infinity, which gets around the timeout issue altogether and allows openvpn to function as usual.
Would there be any reason not to do it (in my head it currently makes a lot of sense)?

A: TimeoutStartSec?=infinity is a very bad solution for OpenVPN. An example: If we don't modify OpenVPN to call sd_notify() at an earlier point, running systemctl start openvpn-client@CONFIG will actually not return back to the command prompt again until the "Initialization Sequence Complete" log line is reached. This can in worst case cause havoc on a booting system, where you will never reach a login prompt until OpenVPN have managed to get a connection. Which makes it even harder to debug the system when issues appear. This is by far the worst approach, then Type=simple is actually far better!

comment:8 Changed 7 years ago by eworm

I am pretty sure I have users that will complain. :-P

On the other hand I have to admit that your line of argumentation is valid.

Systemd has a mechanism for what we want: network-online.target is active when a network connection has been established. So given we have a connection openvpn-client@example.service we want a target openvpn-client-online@example.target that is active when the vpn connection is up... No idea how to implement it, though...

comment:9 Changed 7 years ago by David Sommerseth

Resolution: fixed
Status: acceptedclosed

Patch applied. Closing this, as this should resolve the issue described in the initial description.

In regards to the eworms comment:8, we are looking into a better solution to implement a something along the lines of separate openvpn-client-online@CONFIG unit file to have some possibility to have "VPN connected" dependencies. This is currently targeted for OpenVPN v2.5.

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

ACK.  Tested this lightly with both server and client profiles,
and this seems to work very well.  Users having keys under
/home will probably complain.  But I do not think system wide
configurations should depend on keys in users' home directories.

Your patch has been applied to the following branches

commit 76096c605fcac4815674b6ae76ac1f31f03a8186  (master)
commit ba3ccaf92d379f8a2efad80cee7dc2806088f421  (release/2.4)
Author: Christian Hesse
Date:   Tue Dec 27 23:18:32 2016 +0100

     systemd: Add more security feature for systemd units

     Signed-off-by: Christian Hesse <mail@eworm.de>
     Acked-by: David Sommerseth <davids@openvpn.net>
     Message-Id: <20161227221832.610-1-list@eworm.de>
     URL: https://www.mail-archive.com/openvpn-devel@lists.sourceforge.net/msg13743.html
     Signed-off-by: David Sommerseth <davids@openvpn.net>


- --
kind regards,

David Sommerseth

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)

iQIcBAEBAgAGBQJYiPPRAAoJEIbPlEyWcf3ymHAQAIuy8UXELqs4blJCs0IfHtOg
nALf1ALs+fvhQp7he+Z+wxmDT+QFL3mli53RXKZZGG8y+boWgRFkJ11PX4YzsCO9
KSJAWqbDRW+n9pp8uZ7q6D+uLeiO6ziKhEm6Jl2zln4FtTxi3pW8jUcONEdNwBtP
XSzJ7vdKAuiKGp8GvDm+uWpyjRJvpsClKynPvh2sN0bY6ERlkLDYKUMy37+1vDfc
HV6k1Rdwq7L4fVxbx9P6YWF/iKX3yuqTK1Qs99ZaZ7N4UE+QAYtuOtzDbAb+ISrN
fs7kOim2H5OZ8qfgYTosE6bRLP4mTEqgKX7zjYUMPWWZZgvrW3H9E4v/nGWupZxD
NaLIE+WELGDVZRnHMZRkVYCbK6zcPYyvfrehc2O/cKKNAtsIQLDe8NdiirpWsbRB
7+e/UCgIyvtPTxaf1FoVD0ENdyfGXT3EYRIuCsPA6Uk9C5vu3JpAhBnxhleaynE6
Vkh29k3lxBi/CHKsIa2DYYLqumrkwlI9TJ6FHSt7xR4zYkA92BZb2UKZGDkuCss7
g5wNrnxuswIaAgllsa7YSKkVSS41DhlUZG+flFnFLbodME5HatUJWokAqJzOzC7L
vH1SMR+AQGKRwV1YkEJYRsxYfKJFByViipoqHkStq70Bl7Qql2hGXS+qh8cC/bLK
WX35Il88WHshyCmcqI+d
=2/Wt
-----END PGP SIGNATURE-----

https://www.mail-archive.com/openvpn-devel@lists.sourceforge.net/msg13954.html
Message-Id: 20170125185203.10467-1-davids@…

comment:10 Changed 7 years ago by eworm

You pasted the wrong message... This one is relevant:

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

I took the liberty to improve the commit message a little bit,
to make a few things clearer.


Your patch has been applied to the following branches

commit e83a8684f0a0d944e9d53cdad2b543cfd1b6fbae  (master)
commit 041fd6488434b5df01f86dd873b536a2b690ee13  (release/2.4)
Author: David Sommerseth
Date:   Wed Jan 25 00:23:44 2017 +0100

     systemd: Move the READY=1 signalling to an earlier point

     Trac: #827, #801
     Signed-off-by: David Sommerseth <dav...@openvpn.net>
     Acked-by: Gert Doering <g...@greenie.muc.de>
     Acked-by: Christian Hesse <m...@eworm.de>
     Message-Id: <20170124232344.7825-1-dav...@openvpn.net>
     URL: https://www.mail-archive.com/openvpn-devel@lists.sourceforge.net/msg13945.html
     Signed-off-by: David Sommerseth <dav...@openvpn.net>


- --
kind regards,

David Sommerseth

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)

iQIcBAEBAgAGBQJYiPJxAAoJEIbPlEyWcf3yxH4QALLTpJeE7AYS5N+54cRbswIM
Qjw/rKIU5WUe7hf6R//H8HmsWWbhCddYMYojl1XRcLglUdGFm+fTWkTUIUAtEZIy
F9IeTQNVRBYKXoJ5pVWs+9MfCKrdP7tuaUOouqmpuaU3EUwOTwXb9jfDhdNmiQFP
bbE3LC5uv4/BMUK/nPWX/9bqfiUhRRhoOIgDeLBaz2opvBNhLzx4wZV5EPXIsHG0
i1/BKmaU4nlBtVdC0CnL3fuKo1rjoAM+gPqZK1RbZISOBuXXuSJPx9hbNwaY1tU9
Vnvzd4iGmDdjBmxRkjgCwtYcwGGSPR5CCTa0xttqBFAj9Ljp2ZkQECPMWHWbY2D5
3/nj1RhY//YV6iWQD4dgq/Xtnfhp5c9bW00vl8t3h7psiO1BbkrRyQBMvGX7McwS
BvrgXhL0b8E5YLYM45ZQfYeCeNHXlZRlAIVEV0eC6ZHvFKqkVlhsjDHMzwztgbyF
tydEp8vDQtpixIMccUDPtYdWx5OXEje/sR0394tymYwCoMlak4lF4ED2PXdzzkHB
Lf41uTgoGUFhw3qbEaIhhTvbRCYq/AfSi7ojn46zhVqy6YPUG6F5WC4oHRlrvB63
OgGDtN2+F4loKXCOnbMPGM0eT3TKx11ft/fwnCJuVH6RVW3hybanGEXr3wkvWy6C
B/PEIq+CSxztTxxY49OI
=dvhT
-----END PGP SIGNATURE-----

The mail archive link is correct, though.

Note: See TracTickets for help on using tickets.