Opened 8 years ago
Closed 8 years ago
#786 closed Bug / Defect (fixed)
OpenVPN burns 100% CPU polling fds
| Reported by: | marcan | Owned by: | |
|---|---|---|---|
| Priority: | blocker | Milestone: | release 2.4 |
| Component: | Generic / unclassified | Version: | OpenVPN git master branch (Community Ed) |
| Severity: | Not set (select this one, unless your'e a OpenVPN developer) | Keywords: | |
| Cc: | stipa |
Description
OpenVPN immediately starts burning 100% CPU upon startup. strace shows endless spam like this:
poll([{fd=4, events=POLLIN|POLLPRI}, {fd=0, events=POLLIN|POLLPRI}], 2, 1056) = 1 ([{fd=0, revents=POLLIN}])
poll([{fd=4, events=POLLIN|POLLPRI}, {fd=0, events=POLLIN|POLLPRI}], 2, 1056) = 1 ([{fd=0, revents=POLLIN}])
This is clearly wrong: OpenVPN is polling stdin (fd 0), which, since it is redirected from /dev/null when run as a daemon, always returns POLLIN (because /dev/null always returns an EOF condition), but OpenVPN isn't attempting a read or doing anything else with it and just retriggers the polling forever.
This happens with 3 different VPN configs so I don't think it's really specific to a particular config. In fact, something as simple as this triggers the bug:
openvpn --dev tun5 < /dev/null
It can also be triggered by leaving out the /dev/null: then CPU usage is fine until you type anything into stdin, at which point it goes into the endless poll loop again.
Version: 2.4_rc1 (not available in the bugtracker picker yet)
Distro: Gentoo Linux (installed via portage)
Attachments (1)
Change History (8)
comment:1 Changed 8 years ago by
comment:2 Changed 8 years ago by
This is using the portage ebuild, which as you mention doesn't really do anything weird other than apply that cmocka patch. However, a simple ./configure && make build of the tarball does not have the same symptoms, but building with the configure options that the ebuild uses does.
It seems the culprit configure option is --disable-async-push, which maps to the (negated) USE flag inotify. However, oddly enough, --enable-async-push also causes the problem; the option needs to be omitted entirely for things to work. So regardless of the USE setting, the package is broken on Gentoo, as it always uses one or the other flag.
Looks like the configure script is broken and passing either flag actually behaves like enabling it:
$ ./configure --enable-async-push &>/dev/null && grep ENABLE_ASYNC_PUSH config.h #define ENABLE_ASYNC_PUSH 1 $ ./configure --disable-async-push &>/dev/null && grep ENABLE_ASYNC_PUSH config.h #define ENABLE_ASYNC_PUSH 1 $ ./configure &>/dev/null && grep ENABLE_ASYNC_PUSH config.h /* #undef ENABLE_ASYNC_PUSH */
Running with --verb 9 does mention fd 0:
Wed Dec 7 05:55:10 2016 us=854515 PO_CTL rwflags=0x0001 ev=0 arg=0x6ab622ef064
I'm attaching the full log.
Changed 8 years ago by
| Attachment: | openvpn-loop.log added |
|---|
comment:3 Changed 8 years ago by
| Cc: | stipa added |
|---|
thanks, this makes sense - I'm not configuring with options, and none of our buildslaves would run the "</dev/null" test.
Someone (was that you?) already mentioned on the openvpn-devel list that the --xxx-async-push options for configure do not work as expected, and it seems there is more brokenness there :-(
We'll go and investigate.
Since I'm a bit busy today and tomorrow, I just take the liberty and cc: this to Lev to investigate :-)
commit 0d1a75bfe241466230c41a52c6013494135c5935
Author: Lev Stipakov <lstipakov@…>
Date: Sat Oct 10 19:04:25 2015 +0300
Send push reply right after async auth complete
comment:4 Changed 8 years ago by
+#ifdef ENABLE_ASYNC_PUSH
+ /* arm inotify watcher */
+ event_ctl (c->c2.event_set, c->c2.inotify_fd, EVENT_READ, (void*)&file_shift)
;
+#endif
this very much looks like "if no async push is pending, inotify_fd is 0, and should *not* be armed"... plus the configure mishap.
comment:5 Changed 8 years ago by
| Milestone: | → release 2.4 |
|---|---|
| Priority: | major → blocker |
comment:6 Changed 8 years ago by
Thanks, we have 2 bugs here:
- configure script incorrectly handles --disable-async-push (fixed in https://www.mail-archive.com/openvpn-devel@lists.sourceforge.net/msg13414.html)
- inotify armed in client mode (patch is on the list, https://www.mail-archive.com/openvpn-devel@lists.sourceforge.net/msg13415.html)
comment:7 Changed 8 years ago by
| Resolution: | → fixed |
|---|---|
| Status: | new → closed |
Both these issues should be fixed now.
commit 7084a3993fa35c6fb71abe8aac7b30f442205e2a
Author: Lev Stipakov <lstipakov@gmail.com>
Date: Wed Dec 7 01:45:51 2016 +0200
Arm inotify only in server mode
Async-push is a server side feature and inotify_fd is
initialized in server mode.
Trac: #786
Signed-off-by: Lev Stipakov <lstipakov@gmail.com>
Acked-by: David Sommerseth <davids@openvpn.net>
Message-Id: <1481067951-28917-1-git-send-email-lstipakov@gmail.com>
URL: https://www.mail-archive.com/openvpn-devel@lists.sourceforge.net/msg13415.html
Signed-off-by: David Sommerseth <davids@openvpn.net>
commit e62eccf025aa60ec268787d2aa4a46310ed1cd60
Author: David Sommerseth <davids@openvpn.net>
Date: Tue Dec 6 23:10:51 2016 +0100
Fix wrong configure.ac parsing of --enable-async-push
AC_ARG_ENABLE() was used wrong, which led enable_async_push to
always be set, regardless if --enable-async-push or --disable-async-push
was used.
Also spotted the exact same patch when writing this commit message as
GitHub PR#70.
Trac: #786
Signed-off-by: David Sommerseth <davids@openvpn.net>
Acked-by: Lev Stipakov <lstipakov@gmail.com>
Message-Id: <1481062251-18349-1-git-send-email-davids@openvpn.net>
URL: https://www.mail-archive.com/openvpn-devel@lists.sourceforge.net/msg13411.html
They will arrive in the v2.4_rc2 release.

Is this with gentoo specific patches, or built from git master?
I'm asking because I'm testing exactly this on gentoo:
$ sudo src/openvpn/openvpn --dev tun5 </dev/null
Tue Dec 6 21:05:03 2016 disabling NCP mode (--ncp-disable) because not in P2MP client or server mode
Tue Dec 6 21:05:03 2016 OpenVPN 2.4_rc1 [git:agibug/211a3a07222ff985+] x86_64-pc-linux-gnu [SSL (OpenSSL)] [LZO] [LZ4] [EPOLL] [MH/PKTINFO] [AEAD] built on Dec 6 2016
Tue Dec 6 21:05:03 2016 library versions: OpenSSL 1.0.2j 26 Sep 2016, LZO 2.08
Tue Dec 6 21:05:03 2016 * WARNING *: all encryption and authentication features disabled -- all data will be tunnelled as cleartext
Tue Dec 6 21:05:03 2016 TUN/TAP device tun5 opened
Tue Dec 6 21:05:03 2016 Could not determine IPv4/IPv6 protocol. Using AF_INET
Tue Dec 6 21:05:03 2016 UDPv4 link local (bound): [AF_INET][undef]:1194
Tue Dec 6 21:05:03 2016 UDPv4 link remote: [AF_UNSPEC]
... and there it sits, not using significant CPU time (as in "it does not even show up in top").
Strace shows it polling fd=4 and fd=3
poll([{fd=4, events=POLLIN|POLLPRI}, {fd=3, events=POLLIN|POLLPRI}], 2, 1121) = 0 (Timeout)
poll([{fd=4, events=POLLIN|POLLPRI}, {fd=3, events=POLLIN|POLLPRI}], 2, 1121) = 0 (Timeout)
poll([{fd=4, events=POLLIN|POLLPRI}, {fd=3, events=POLLIN|POLLPRI}], 2, 1121) = 0 (Timeout)
poll([{fd=4, events=POLLIN|POLLPRI}, {fd=3, events=POLLIN|POLLPRI}], 2, 1121) = 0 (Timeout)
... with an 1s timeout, which is the expected interval for internal timers.
--verb 9added to the command line confirmsthis:
Tue Dec 6 21:07:40 2016 us=485781 TIMER: coarse timer wakeup 1 seconds
Tue Dec 6 21:07:40 2016 us=485793 PO_CTL rwflags=0x0001 ev=4 arg=0x006b8b60
Tue Dec 6 21:07:40 2016 us=485800 PO_CTL rwflags=0x0001 ev=3 arg=0x006b8a88
Tue Dec 6 21:07:40 2016 us=485814 I/O WAIT TR|Tw|SR|Sw [1/171044]
... so: I have no idea why it's adding 0 to your fd sets. It obviously should not do that, but "it's not doing it for me".
(Looking at openvpn-2.4_rc1-r1.ebuild I can't see anything that would explain it either - the only patch is related to cmocka testing, and that does not touch actual openvpn code)
Can you re-run your most basic example with
--verb 9and attach the output (break with ctrl-c when it starts looping :) )? It might give some clue what is happening.