Opened 7 years ago
Closed 7 years ago
#786 closed Bug / Defect (fixed)
OpenVPN burns 100% CPU polling fds
Reported by: | marcan | Owned by: | |
---|---|---|---|
Priority: | blocker | Milestone: | release 2.4 |
Component: | Generic / unclassified | Version: | OpenVPN git master branch (Community Ed) |
Severity: | Not set (select this one, unless your'e a OpenVPN developer) | Keywords: | |
Cc: | stipa |
Description
OpenVPN immediately starts burning 100% CPU upon startup. strace shows endless spam like this:
poll([{fd=4, events=POLLIN|POLLPRI}, {fd=0, events=POLLIN|POLLPRI}], 2, 1056) = 1 ([{fd=0, revents=POLLIN}]) poll([{fd=4, events=POLLIN|POLLPRI}, {fd=0, events=POLLIN|POLLPRI}], 2, 1056) = 1 ([{fd=0, revents=POLLIN}])
This is clearly wrong: OpenVPN is polling stdin (fd 0), which, since it is redirected from /dev/null when run as a daemon, always returns POLLIN (because /dev/null always returns an EOF condition), but OpenVPN isn't attempting a read or doing anything else with it and just retriggers the polling forever.
This happens with 3 different VPN configs so I don't think it's really specific to a particular config. In fact, something as simple as this triggers the bug:
openvpn --dev tun5 < /dev/null
It can also be triggered by leaving out the /dev/null: then CPU usage is fine until you type anything into stdin, at which point it goes into the endless poll loop again.
Version: 2.4_rc1 (not available in the bugtracker picker yet)
Distro: Gentoo Linux (installed via portage)
Attachments (1)
Change History (8)
comment:1 Changed 7 years ago by
comment:2 Changed 7 years ago by
This is using the portage ebuild, which as you mention doesn't really do anything weird other than apply that cmocka patch. However, a simple ./configure && make
build of the tarball does not have the same symptoms, but building with the configure options that the ebuild uses does.
It seems the culprit configure option is --disable-async-push
, which maps to the (negated) USE flag inotify
. However, oddly enough, --enable-async-push
also causes the problem; the option needs to be omitted entirely for things to work. So regardless of the USE setting, the package is broken on Gentoo, as it always uses one or the other flag.
Looks like the configure script is broken and passing either flag actually behaves like enabling it:
$ ./configure --enable-async-push &>/dev/null && grep ENABLE_ASYNC_PUSH config.h #define ENABLE_ASYNC_PUSH 1 $ ./configure --disable-async-push &>/dev/null && grep ENABLE_ASYNC_PUSH config.h #define ENABLE_ASYNC_PUSH 1 $ ./configure &>/dev/null && grep ENABLE_ASYNC_PUSH config.h /* #undef ENABLE_ASYNC_PUSH */
Running with --verb 9
does mention fd 0:
Wed Dec 7 05:55:10 2016 us=854515 PO_CTL rwflags=0x0001 ev=0 arg=0x6ab622ef064
I'm attaching the full log.
Changed 7 years ago by
Attachment: | openvpn-loop.log added |
---|
comment:3 Changed 7 years ago by
Cc: | stipa added |
---|
thanks, this makes sense - I'm not configuring with options, and none of our buildslaves would run the "</dev/null" test.
Someone (was that you?) already mentioned on the openvpn-devel list that the --xxx-async-push options for configure do not work as expected, and it seems there is more brokenness there :-(
We'll go and investigate.
Since I'm a bit busy today and tomorrow, I just take the liberty and cc: this to Lev to investigate :-)
commit 0d1a75bfe241466230c41a52c6013494135c5935
Author: Lev Stipakov <lstipakov@…>
Date: Sat Oct 10 19:04:25 2015 +0300
Send push reply right after async auth complete
comment:4 Changed 7 years ago by
+#ifdef ENABLE_ASYNC_PUSH
+ /* arm inotify watcher */
+ event_ctl (c->c2.event_set, c->c2.inotify_fd, EVENT_READ, (void*)&file_shift)
;
+#endif
this very much looks like "if no async push is pending, inotify_fd is 0, and should *not* be armed"... plus the configure mishap.
comment:5 Changed 7 years ago by
Milestone: | → release 2.4 |
---|---|
Priority: | major → blocker |
comment:6 Changed 7 years ago by
Thanks, we have 2 bugs here:
- configure script incorrectly handles --disable-async-push (fixed in https://www.mail-archive.com/openvpn-devel@lists.sourceforge.net/msg13414.html)
- inotify armed in client mode (patch is on the list, https://www.mail-archive.com/openvpn-devel@lists.sourceforge.net/msg13415.html)
comment:7 Changed 7 years ago by
Resolution: | → fixed |
---|---|
Status: | new → closed |
Both these issues should be fixed now.
commit 7084a3993fa35c6fb71abe8aac7b30f442205e2a Author: Lev Stipakov <lstipakov@gmail.com> Date: Wed Dec 7 01:45:51 2016 +0200 Arm inotify only in server mode Async-push is a server side feature and inotify_fd is initialized in server mode. Trac: #786 Signed-off-by: Lev Stipakov <lstipakov@gmail.com> Acked-by: David Sommerseth <davids@openvpn.net> Message-Id: <1481067951-28917-1-git-send-email-lstipakov@gmail.com> URL: https://www.mail-archive.com/openvpn-devel@lists.sourceforge.net/msg13415.html Signed-off-by: David Sommerseth <davids@openvpn.net> commit e62eccf025aa60ec268787d2aa4a46310ed1cd60 Author: David Sommerseth <davids@openvpn.net> Date: Tue Dec 6 23:10:51 2016 +0100 Fix wrong configure.ac parsing of --enable-async-push AC_ARG_ENABLE() was used wrong, which led enable_async_push to always be set, regardless if --enable-async-push or --disable-async-push was used. Also spotted the exact same patch when writing this commit message as GitHub PR#70. Trac: #786 Signed-off-by: David Sommerseth <davids@openvpn.net> Acked-by: Lev Stipakov <lstipakov@gmail.com> Message-Id: <1481062251-18349-1-git-send-email-davids@openvpn.net> URL: https://www.mail-archive.com/openvpn-devel@lists.sourceforge.net/msg13411.html
They will arrive in the v2.4_rc2 release.
Is this with gentoo specific patches, or built from git master?
I'm asking because I'm testing exactly this on gentoo:
$ sudo src/openvpn/openvpn --dev tun5 </dev/null
Tue Dec 6 21:05:03 2016 disabling NCP mode (--ncp-disable) because not in P2MP client or server mode
Tue Dec 6 21:05:03 2016 OpenVPN 2.4_rc1 [git:agibug/211a3a07222ff985+] x86_64-pc-linux-gnu [SSL (OpenSSL)] [LZO] [LZ4] [EPOLL] [MH/PKTINFO] [AEAD] built on Dec 6 2016
Tue Dec 6 21:05:03 2016 library versions: OpenSSL 1.0.2j 26 Sep 2016, LZO 2.08
Tue Dec 6 21:05:03 2016 * WARNING *: all encryption and authentication features disabled -- all data will be tunnelled as cleartext
Tue Dec 6 21:05:03 2016 TUN/TAP device tun5 opened
Tue Dec 6 21:05:03 2016 Could not determine IPv4/IPv6 protocol. Using AF_INET
Tue Dec 6 21:05:03 2016 UDPv4 link local (bound): [AF_INET][undef]:1194
Tue Dec 6 21:05:03 2016 UDPv4 link remote: [AF_UNSPEC]
... and there it sits, not using significant CPU time (as in "it does not even show up in top").
Strace shows it polling fd=4 and fd=3
poll([{fd=4, events=POLLIN|POLLPRI}, {fd=3, events=POLLIN|POLLPRI}], 2, 1121) = 0 (Timeout)
poll([{fd=4, events=POLLIN|POLLPRI}, {fd=3, events=POLLIN|POLLPRI}], 2, 1121) = 0 (Timeout)
poll([{fd=4, events=POLLIN|POLLPRI}, {fd=3, events=POLLIN|POLLPRI}], 2, 1121) = 0 (Timeout)
poll([{fd=4, events=POLLIN|POLLPRI}, {fd=3, events=POLLIN|POLLPRI}], 2, 1121) = 0 (Timeout)
... with an 1s timeout, which is the expected interval for internal timers.
--verb 9
added to the command line confirmsthis:
Tue Dec 6 21:07:40 2016 us=485781 TIMER: coarse timer wakeup 1 seconds
Tue Dec 6 21:07:40 2016 us=485793 PO_CTL rwflags=0x0001 ev=4 arg=0x006b8b60
Tue Dec 6 21:07:40 2016 us=485800 PO_CTL rwflags=0x0001 ev=3 arg=0x006b8a88
Tue Dec 6 21:07:40 2016 us=485814 I/O WAIT TR|Tw|SR|Sw [1/171044]
... so: I have no idea why it's adding 0 to your fd sets. It obviously should not do that, but "it's not doing it for me".
(Looking at openvpn-2.4_rc1-r1.ebuild I can't see anything that would explain it either - the only patch is related to cmocka testing, and that does not touch actual openvpn code)
Can you re-run your most basic example with
--verb 9
and attach the output (break with ctrl-c when it starts looping :) )? It might give some clue what is happening.