#827 closed Patch submission (fixed)
openvpn-client@.service fails permanently, if there are DNS issues
Reported by: | dvzrv | Owned by: | David Sommerseth |
---|---|---|---|
Priority: | major | Milestone: | release 2.4.1 |
Component: | Generic / unclassified | Version: | OpenVPN 2.4.0 (Community Ed) |
Severity: | Not set (select this one, unless your'e a OpenVPN developer) | Keywords: | systemd |
Cc: | eworm |
Description
As noted downstream in the Arch Linux bug tracker (https://bugs.archlinux.org/task/52654), the openvpn-client@.service fails permanently, if there are DNS issues during startup (which could happen, if you move between networks with the client or stay offline in the beginning).
This would render the tunnel useless and force unprivilegded users to reboot to get a connection in a new (working) network.
As openvpn-client@.service is supposed to be a long-running service (that "just works"), it should be restarted as often as possible.
"Restart=on-abnormal" comes to the rescue here (https://www.freedesktop.org/software/systemd/man/systemd.service.html#Restart)
Steps to reproduce:
- Configure and enable working openvpn client connection (relying on domain name resolution for the server in use) on system.
- Start computer in a network with misconfigured DNS.
- Wait until the service permanently fails.
- Change to network with working DNS (no reconnect attempts).
Attachments (4)
Change History (14)
Changed 8 years ago by
Attachment: | systemd-restart.patch added |
---|
comment:1 Changed 8 years ago by
I have no doubt that this approach will work; it is probably an approach we should really consider to apply as well (at least in openvpn-server@).
But this currently feels more like a workaround than a proper fix to this issue. I am not convinced an OpenVPN client should exit if it can't resolve DNS queries. The man page states:
--resolv-retry n If hostname resolve fails for --remote, retry resolve for n seconds before failing. Set n to "infinite" to retry indefinitely. By default, --resolv-retry infinite is enabled. You can disable by setting n=0.
It would be valuable to see configurations where this fails together with log files with --verb 4. This will help us see how this can be improved. It would also be valuable to know if this issue is also present if the OpenVPN client is started from the command line directly, or only if it is via systemd.
comment:2 Changed 8 years ago by
"resolv-retry infinite" was set in the configuration used in the test case mentioned above.
I'm suspecting this will be a systemd issue, but I can try and provide information for both cases in a bit!
Just at work now, so give me a couple of hours.
Changed 8 years ago by
Attachment: | openvpn-client-without_systemd.log added |
---|
log of openvpn client without systemd (no DNS)
Changed 8 years ago by
Attachment: | openvpn-client-with_systemd.log added |
---|
log of openvpn client with systemd (no DNS)
Changed 8 years ago by
Attachment: | openvpn-client-with_systemd-status.log added |
---|
systemctl status of openvpn client after failing (no DNS)
comment:3 Changed 8 years ago by
Okay, I think I found out, why things are behaving the way they are behaving (and as I suspected, they only behave this way for systemd).
First off: openvpn-{client,server}@.service are of Type=notify (https://www.freedesktop.org/software/systemd/man/systemd.service.html#Type=), which means, that they are supposed to return a signal, once they are all setup.
All services, that are not starting within a certain amount of time will be stopped and marked as failed by timeout, which they are in this particular case, as they run longer than the given default 90s timeout defined by TimeoutStartSec? (https://www.freedesktop.org/software/systemd/man/systemd.service.html#TimeoutStartSec=) and/or DefaultTimeoutStartSec? (https://www.freedesktop.org/software/systemd/man/systemd-system.conf.html#DefaultTimeoutStartSec=).
openvpn-{client,server}@.service will not be able to send the signal needed for Type=notify to be successful in a case where it can not resolve the hostname or doesn't have network access and thus the service fails permanently.
This can be fixed in two ways:
- Make openvpn-{client,server}@.service "Type=simple" (not so smart)
- Proceed as previously mentioned by adding "Restart=on-abnormal" to openvpn-{client,server}@.service
comment:4 Changed 8 years ago by
Cc: | eworm added |
---|---|
Keywords: | service removed |
Right, this makes sense, but I disagree to your conclusion. Using Restart=on-abnormal in this case is a workaround. And we will not revert back to Type=simple either, even though that does work too.
We need to look into moving the sd_notify() call earlier than when OpenVPN completes the initialization, which for the client side means after a successful connection. We also have a somewhat related issue when using p2p with static keys as well.
I'm adding eworm on Cc to this ticket, who have contributed with a few patches already improving the systemd integration.
comment:5 Changed 8 years ago by
@dazo: Well, I'd much rather have a workaround than not be able to connect in certain circumstances ;-)
Restart=on-abnormal at least assures that the process will be restarted as soon as it hits the timeout. This is probably also a good idea, if something else goes wrong.
I'd like you to consider the following case though:
Given, that you start a unit as a dependency of openvpn-client@.service, would your proposed solution not mean, that openvpn-client@.service would be marked as successfully started up (using READY=1 with sd_notify(), although not connected)?
In that case any unit depending on it would start after that on false grounds assuming that its dependencies are met and probably fail.
I'm not sure that sd_notify() is flexible enough to handle this case, as it is dependant on TimeoutStartSec?.
So, another possible thing to do would be to set TimeoutStartSec?=infinity, which gets around the timeout issue altogether and allows openvpn to function as usual.
Would there be any reason not to do it (in my head it currently makes a lot of sense)?
I can test this in a couple of hours and provide some log output.
comment:6 Changed 8 years ago by
Milestone: | → release 2.4.1 |
---|---|
Owner: | set to David Sommerseth |
Status: | new → accepted |
I've just submitted ato the mailing list which should resolve this issue.
If you have a chance, please test this new patch and report back
https://sourceforge.net/p/openvpn/mailman/message/35624370/
Message-Id: <20170124232344.7825-1-davids@…>
Author: David Sommerseth Date: Tue Jan 24 23:16:24 2017 +0100 systemd: Move the READY=1 signalling to an earlier point Currently, OpenVPN will first tell systemd it is ready once the log will be appended with "Initialization Sequence Completed". This turns out to cause some issues several places. First, it adds challenges if --chroot is used in the configuration; this we already fixed. Secondly, it will cause havoc on static key p2p mode configurations where the log line above will not happen before either sides have completed establishing a connection. And thirdly, if a client configuration fails to establish a connection within 90 seconds it will also fail. For the third case this may not need to be a critical issue, as the host just needs to get an Internet access established first - which in some scenarios may take much longer than those 90 seconds after the OpenVPN client configuration tries to start. The approach this patch takes is to consider OpenVPN ready once all the initial preparations and configurations have happened - but before a connection to a remote side have been attempted. This also removes the need for specially handling the --chroot scenario. The final "Initialization Sequence Completed" message update is kept (though slightly simplified) to indicate we're in a good state - even though that update will still not be visible if --chroot is used (as before this patch). Trac: #827, #801 Signed-off-by: David Sommerseth
comment:7 Changed 8 years ago by
And to answer some questions:
Q: Given, that you start a unit as a dependency of openvpn-client@.service, would your proposed solution not mean, that openvpn-client@.service would be marked as successfully started up (using READY=1 with sd_notify(), although not connected)?
A: Yes. IMO, it should not primarily be systemds task to consider if a service is running based on if it got a connection to a remote site or not. If a connection is established or not, how long to wait for until aborting, etc, should primarily be OpenVPNs task to evaluate - as OpenVPN knows how such connections should work, while systemd does not. So let systemd manage the overall process without caring what the process does internally, and let the OpenVPN process take care of its own business.
Q: In that case any unit depending on it would start after that on false grounds assuming that its dependencies are met and probably fail.
A: That's a fair point. And yes, this can make unit dependencies fail, if it is believed that systemd manages the VPN connection itself. Currently there are no good approaches to that, but I believe that should be managed outside of the openvpn-{client,server}@.service unit files. We could introduce another set of unit files (unless NetworkManager? already got that?) which can carry such a feature; like having a service which can poll a running OpenVPN configuration if the tunnel is established or not.
This has the advantage of separating OpenVPN process management from OpenVPN connection management. Because: A running OpenVPN process is no guarantee that you have a valid VPN connection. If an OpenVPN process is not reporting "READY=1" back to the systemd is not any indication at all that the process is having issues, it might just need much more time than systemd thinks it should have. And since VPNs are dependent on a functional Internet, which have billions (if not trillions and more) different scenarios where it works somehow, we can't have a realistic timeout value which will satisfy most users. Well, we could add 30 minutes, or even 10 minutes - but if such an issue hits during a boot on a critical server and you have limited possibilities to understand why the boot is still lagging after 8 minutes, that would not be a really good experience.
So: Keep process management and connectivity management as two separate issues, and handle dependencies for these separate areas independently. Their use cases are quite different.
Q: So, another possible thing to do would be to set TimeoutStartSec??=infinity, which gets around the timeout issue altogether and allows openvpn to function as usual.
Would there be any reason not to do it (in my head it currently makes a lot of sense)?
A: TimeoutStartSec?=infinity is a very bad solution for OpenVPN. An example: If we don't modify OpenVPN to call sd_notify() at an earlier point, running systemctl start openvpn-client@CONFIG will actually not return back to the command prompt again until the "Initialization Sequence Complete" log line is reached. This can in worst case cause havoc on a booting system, where you will never reach a login prompt until OpenVPN have managed to get a connection. Which makes it even harder to debug the system when issues appear. This is by far the worst approach, then Type=simple is actually far better!
comment:8 Changed 8 years ago by
I am pretty sure I have users that will complain. :-P
On the other hand I have to admit that your line of argumentation is valid.
Systemd has a mechanism for what we want: network-online.target
is active when a network connection has been established. So given we have a connection openvpn-client@example.service
we want a target openvpn-client-online@example.target
that is active when the vpn connection is up... No idea how to implement it, though...
comment:9 Changed 8 years ago by
Resolution: | → fixed |
---|---|
Status: | accepted → closed |
Patch applied. Closing this, as this should resolve the issue described in the initial description.
In regards to the eworms comment:8, we are looking into a better solution to implement a something along the lines of separate openvpn-client-online@CONFIG
unit file to have some possibility to have "VPN connected" dependencies. This is currently targeted for OpenVPN v2.5.
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 ACK. Tested this lightly with both server and client profiles, and this seems to work very well. Users having keys under /home will probably complain. But I do not think system wide configurations should depend on keys in users' home directories. Your patch has been applied to the following branches commit 76096c605fcac4815674b6ae76ac1f31f03a8186 (master) commit ba3ccaf92d379f8a2efad80cee7dc2806088f421 (release/2.4) Author: Christian Hesse Date: Tue Dec 27 23:18:32 2016 +0100 systemd: Add more security feature for systemd units Signed-off-by: Christian Hesse <mail@eworm.de> Acked-by: David Sommerseth <davids@openvpn.net> Message-Id: <20161227221832.610-1-list@eworm.de> URL: https://www.mail-archive.com/openvpn-devel@lists.sourceforge.net/msg13743.html Signed-off-by: David Sommerseth <davids@openvpn.net> - -- kind regards, David Sommerseth -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iQIcBAEBAgAGBQJYiPPRAAoJEIbPlEyWcf3ymHAQAIuy8UXELqs4blJCs0IfHtOg nALf1ALs+fvhQp7he+Z+wxmDT+QFL3mli53RXKZZGG8y+boWgRFkJ11PX4YzsCO9 KSJAWqbDRW+n9pp8uZ7q6D+uLeiO6ziKhEm6Jl2zln4FtTxi3pW8jUcONEdNwBtP XSzJ7vdKAuiKGp8GvDm+uWpyjRJvpsClKynPvh2sN0bY6ERlkLDYKUMy37+1vDfc HV6k1Rdwq7L4fVxbx9P6YWF/iKX3yuqTK1Qs99ZaZ7N4UE+QAYtuOtzDbAb+ISrN fs7kOim2H5OZ8qfgYTosE6bRLP4mTEqgKX7zjYUMPWWZZgvrW3H9E4v/nGWupZxD NaLIE+WELGDVZRnHMZRkVYCbK6zcPYyvfrehc2O/cKKNAtsIQLDe8NdiirpWsbRB 7+e/UCgIyvtPTxaf1FoVD0ENdyfGXT3EYRIuCsPA6Uk9C5vu3JpAhBnxhleaynE6 Vkh29k3lxBi/CHKsIa2DYYLqumrkwlI9TJ6FHSt7xR4zYkA92BZb2UKZGDkuCss7 g5wNrnxuswIaAgllsa7YSKkVSS41DhlUZG+flFnFLbodME5HatUJWokAqJzOzC7L vH1SMR+AQGKRwV1YkEJYRsxYfKJFByViipoqHkStq70Bl7Qql2hGXS+qh8cC/bLK WX35Il88WHshyCmcqI+d =2/Wt -----END PGP SIGNATURE-----
https://www.mail-archive.com/openvpn-devel@lists.sourceforge.net/msg13954.html
Message-Id: 20170125185203.10467-1-davids@…
comment:10 Changed 8 years ago by
You pasted the wrong message... This one is relevant:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 I took the liberty to improve the commit message a little bit, to make a few things clearer. Your patch has been applied to the following branches commit e83a8684f0a0d944e9d53cdad2b543cfd1b6fbae (master) commit 041fd6488434b5df01f86dd873b536a2b690ee13 (release/2.4) Author: David Sommerseth Date: Wed Jan 25 00:23:44 2017 +0100 systemd: Move the READY=1 signalling to an earlier point Trac: #827, #801 Signed-off-by: David Sommerseth <dav...@openvpn.net> Acked-by: Gert Doering <g...@greenie.muc.de> Acked-by: Christian Hesse <m...@eworm.de> Message-Id: <20170124232344.7825-1-dav...@openvpn.net> URL: https://www.mail-archive.com/openvpn-devel@lists.sourceforge.net/msg13945.html Signed-off-by: David Sommerseth <dav...@openvpn.net> - -- kind regards, David Sommerseth -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iQIcBAEBAgAGBQJYiPJxAAoJEIbPlEyWcf3yxH4QALLTpJeE7AYS5N+54cRbswIM Qjw/rKIU5WUe7hf6R//H8HmsWWbhCddYMYojl1XRcLglUdGFm+fTWkTUIUAtEZIy F9IeTQNVRBYKXoJ5pVWs+9MfCKrdP7tuaUOouqmpuaU3EUwOTwXb9jfDhdNmiQFP bbE3LC5uv4/BMUK/nPWX/9bqfiUhRRhoOIgDeLBaz2opvBNhLzx4wZV5EPXIsHG0 i1/BKmaU4nlBtVdC0CnL3fuKo1rjoAM+gPqZK1RbZISOBuXXuSJPx9hbNwaY1tU9 Vnvzd4iGmDdjBmxRkjgCwtYcwGGSPR5CCTa0xttqBFAj9Ljp2ZkQECPMWHWbY2D5 3/nj1RhY//YV6iWQD4dgq/Xtnfhp5c9bW00vl8t3h7psiO1BbkrRyQBMvGX7McwS BvrgXhL0b8E5YLYM45ZQfYeCeNHXlZRlAIVEV0eC6ZHvFKqkVlhsjDHMzwztgbyF tydEp8vDQtpixIMccUDPtYdWx5OXEje/sR0394tymYwCoMlak4lF4ED2PXdzzkHB Lf41uTgoGUFhw3qbEaIhhTvbRCYq/AfSi7ojn46zhVqy6YPUG6F5WC4oHRlrvB63 OgGDtN2+F4loKXCOnbMPGM0eT3TKx11ft/fwnCJuVH6RVW3hybanGEXr3wkvWy6C B/PEIq+CSxztTxxY49OI =dvhT -----END PGP SIGNATURE-----
The mail archive link is correct, though.
restart patch for openvpn-client@.service