Opened 9 years ago
Closed 20 months ago
#639 closed Bug / Defect (fixed)
non-interruptible loop in windows dns resolution failure
Reported by: | Gert Döring | Owned by: | stipa |
---|---|---|---|
Priority: | major | Milestone: | release 2.6 |
Component: | Generic / unclassified | Version: | OpenVPN git master branch (Community Ed) |
Severity: | Not set (select this one, unless your'e a OpenVPN developer) | Keywords: | windows signal dns loop |
Cc: | tct |
Description
- win7 VM with no IPv6 (no interface has v6, so win7 disables v6 altogether)
- git master 6417a6f8a0
- connecting with "--proto udp6"
- run from console window
-> result is a "DNS resolution fails, retrying" (because windows will refuse to even lookup a v6 record if "there is no v6 in the system!") endless loop, neither ctrl-c nor f1...f4 work.
Didn't try from GUI, was too annoyed at my VM setup... but should be reproduceable easy enough.
The endless loop happens on Windows as well, but ctrl-c works - here's a linux log:
Mon Dec 14 19:42:29 2015 RESOLVE: Cannot resolve host address: v4only.greenie.net:1194 (Name or service not known) Mon Dec 14 19:42:29 2015 Could not determine IPv4/IPv6 protocol Mon Dec 14 19:42:29 2015 SIGUSR1[soft,init_instance] received, process restarting Mon Dec 14 19:42:29 2015 Restart pause, 5 second(s) Mon Dec 14 19:42:34 2015 Control Channel Authentication: tls-auth using INLINE static key file Mon Dec 14 19:42:34 2015 Outgoing Control Channel Authentication: Using 160 bit message hash 'SHA1' for HMAC authentication Mon Dec 14 19:42:34 2015 Incoming Control Channel Authentication: Using 160 bit message hash 'SHA1' for HMAC authentication Mon Dec 14 19:42:34 2015 RESOLVE: Cannot resolve host address: v4only.greenie.net:1194 (Name or service not known) Mon Dec 14 19:42:34 2015 RESOLVE: Cannot resolve host address: v4only.greenie.net:1194 (Name or service not known) Mon Dec 14 19:42:34 2015 Could not determine IPv4/IPv6 protocol Mon Dec 14 19:42:34 2015 SIGUSR1[soft,init_instance] received, process restarting Mon Dec 14 19:42:34 2015 Restart pause, 5 second(s) ^CMon Dec 14 19:42:35 2015 SIGINT[hard,init_instance] received, process exiting
Change History (12)
comment:1 Changed 9 years ago by
Owner: | set to stipa |
---|---|
Status: | new → assigned |
comment:2 Changed 9 years ago by
comment:3 Changed 9 years ago by
Replying to cron2:
- win7 VM with no IPv6 (no interface has v6, so win7 disables v6 altogether)
- git master 6417a6f8a0
- connecting with "--proto udp6"
- run from console window
-> result is a "DNS resolution fails, retrying" (because windows will refuse to even lookup a v6 record if "there is no v6 in the system!") endless loop, neither ctrl-c nor f1...f4 work.
Didn't try from GUI, was too annoyed at my VM setup... but should be reproduceable easy enough.
The endless loop happens on Windows as well, but ctrl-c works
A quick question: is this with Xen, KVM, something else? I've seen ctrl-c sometimes ignored on windows 10 even on physical hardware, though a few tries always work.
Please try ctrl-break as well. Although it was added along with ctrl-c, the former is delivered by windows as a signal, the latter as a key-press like f1..f4, while running from console.
comment:4 Changed 8 years ago by
Failed DNS resolution can loop can become non-interruptible even on Linux: as a test use an unsused ip number as the nameserver in resolv.conf and start a connection. Nearly impossible to break out of it by SIGINT or SIGTERM.
There are a number of places in socket.c where sig_info->signal_received is assigned to SIGUSR1, overwriting previous value which could be SIGTERM or SIGINT (e.g., line socket.c:1919 which appears to be the culprit in this case). Note that sig_info here is a pointer to signinfo_static and its members are volatile. They can change when signals interrupt.
Interestingly its the hard SIGTERM/SIGINT that is easily lost in this case -- a SIGTERM simulated through the management doesn't get noticed until the loop restarts and goes back to init.c. So it survives this blatant over-write of signal_received in socket.c
comment:5 Changed 8 years ago by
see also #311 which is the same issue but got forgotten in the meantime *sigh*
comment:6 Changed 5 years ago by
Cc: | tct added |
---|
comment:7 Changed 4 years ago by
Milestone: | release 2.4 → release 2.5 |
---|
comment:8 Changed 3 years ago by
Milestone: | release 2.5 → release 2.5.3 |
---|
This needs to be re-tested and then either be closed or fixed.
comment:9 Changed 21 months ago by
I think I've bumped into this with 2.6_beta2 - in certain conditions, OpenVPN just ignores incoming SIGTERM (seems to be "when in SIGUSR1 restart wait").
comment:10 Changed 21 months ago by
We need a revamp of how signals are implemented: use posix signal (sigaction) so that signals can be blocked and a priority order can be enforced etc.
I worked on this years back but dropped the ball. If there is enough interest in pursuing this approach I can resurrect it:
See: https://github.com/OpenVPN/openvpn/commit/7e5d775227e6d304ce24d7505da9332f405ee4f3
Or here is a summary from 2018 (some things may be outdated)
`
Fix signal handling issues
Currently signal received is directly modified in many places in the code,
leading to loss of signals, low priority signals overwriting higher priority
ones etc.
- Set all signals using functions like register_signal
- Add a function register_signal_si to help setting of signals when only the pointer to the signal_info struct is available.
- Allow only a higher or equal priority signal to overwirte an already registered but yet to be processed signal. The signals in increasing order of priority are SIGUSR2, SIGUSR1, SIGHUP, SIGTERM, SIGINT.
- Use posix signals (sigaction) to properly block signals while in the handler etc.
- Collect windows signals even when management is not available. Currently Windows signals cannot interrupt openvpn_sleep unless the management interface is in use: the latter forces management_event_loop_n_seconds() in place of sleep().
`
comment:11 Changed 21 months ago by
I guess the patch linked above would be too much, too late for 2.6? What about a band-aid fix that adds signal priority but does not extensively re-write sig.c?
comment:12 Changed 20 months ago by
Milestone: | release 2.5.3 → release 2.6 |
---|---|
Resolution: | → fixed |
Status: | assigned → closed |
Discussion on the signal issues was summarized in
https://github.com/OpenVPN/openvpn/issues/205
and fixed in a number of patches from Selva Nair that went into 2.6.0 - basically doing what was suggested above. The "POSIX sigaction" is still open for review and more time for cross-platform testing.
I am not sure if we want to backport the signal handling changes to release/2.5 ("it qualifies as bugfix"), because it's quite a large changeset to be able to cleanly address this particular effect (pending signals are overwritten by soft-signals at getaddrinfo() erorrs). Given the lack of sustained yelling, this seems to be a more infrequent annoyance mostly - and since *I* opened this particular ticket, I consider it solved sufficiently for my needs, I never run old versions of OpenVPN on Windows.
Thanks, Selva :-)
thanks :)