Opened 3 years ago

Closed 3 years ago

#786 closed Bug / Defect (fixed)

OpenVPN burns 100% CPU polling fds

Reported by: marcan Owned by:
Priority: blocker Milestone: release 2.4
Component: Generic / unclassified Version: OpenVPN git master branch (Community Ed)
Severity: Not set (select this one, unless your'e a OpenVPN developer) Keywords:
Cc: stipa

Description

OpenVPN immediately starts burning 100% CPU upon startup. strace shows endless spam like this:

poll([{fd=4, events=POLLIN|POLLPRI}, {fd=0, events=POLLIN|POLLPRI}], 2, 1056) = 1 ([{fd=0, revents=POLLIN}])
poll([{fd=4, events=POLLIN|POLLPRI}, {fd=0, events=POLLIN|POLLPRI}], 2, 1056) = 1 ([{fd=0, revents=POLLIN}])

This is clearly wrong: OpenVPN is polling stdin (fd 0), which, since it is redirected from /dev/null when run as a daemon, always returns POLLIN (because /dev/null always returns an EOF condition), but OpenVPN isn't attempting a read or doing anything else with it and just retriggers the polling forever.

This happens with 3 different VPN configs so I don't think it's really specific to a particular config. In fact, something as simple as this triggers the bug:

openvpn --dev tun5 < /dev/null

It can also be triggered by leaving out the /dev/null: then CPU usage is fine until you type anything into stdin, at which point it goes into the endless poll loop again.

Version: 2.4_rc1 (not available in the bugtracker picker yet)
Distro: Gentoo Linux (installed via portage)

Attachments (1)

openvpn-loop.log (17.2 KB) - added by marcan 3 years ago.

Download all attachments as: .zip

Change History (8)

comment:1 Changed 3 years ago by Gert Döring

Is this with gentoo specific patches, or built from git master?

I'm asking because I'm testing exactly this on gentoo:

$ sudo src/openvpn/openvpn --dev tun5 </dev/null
Tue Dec 6 21:05:03 2016 disabling NCP mode (--ncp-disable) because not in P2MP client or server mode
Tue Dec 6 21:05:03 2016 OpenVPN 2.4_rc1 [git:agibug/211a3a07222ff985+] x86_64-pc-linux-gnu [SSL (OpenSSL)] [LZO] [LZ4] [EPOLL] [MH/PKTINFO] [AEAD] built on Dec 6 2016
Tue Dec 6 21:05:03 2016 library versions: OpenSSL 1.0.2j 26 Sep 2016, LZO 2.08
Tue Dec 6 21:05:03 2016 * WARNING *: all encryption and authentication features disabled -- all data will be tunnelled as cleartext
Tue Dec 6 21:05:03 2016 TUN/TAP device tun5 opened
Tue Dec 6 21:05:03 2016 Could not determine IPv4/IPv6 protocol. Using AF_INET
Tue Dec 6 21:05:03 2016 UDPv4 link local (bound): [AF_INET][undef]:1194
Tue Dec 6 21:05:03 2016 UDPv4 link remote: [AF_UNSPEC]

... and there it sits, not using significant CPU time (as in "it does not even show up in top").

Strace shows it polling fd=4 and fd=3

poll([{fd=4, events=POLLIN|POLLPRI}, {fd=3, events=POLLIN|POLLPRI}], 2, 1121) = 0 (Timeout)
poll([{fd=4, events=POLLIN|POLLPRI}, {fd=3, events=POLLIN|POLLPRI}], 2, 1121) = 0 (Timeout)
poll([{fd=4, events=POLLIN|POLLPRI}, {fd=3, events=POLLIN|POLLPRI}], 2, 1121) = 0 (Timeout)
poll([{fd=4, events=POLLIN|POLLPRI}, {fd=3, events=POLLIN|POLLPRI}], 2, 1121) = 0 (Timeout)

... with an 1s timeout, which is the expected interval for internal timers. --verb 9 added to the command line confirms
this:

Tue Dec 6 21:07:40 2016 us=485781 TIMER: coarse timer wakeup 1 seconds
Tue Dec 6 21:07:40 2016 us=485793 PO_CTL rwflags=0x0001 ev=4 arg=0x006b8b60
Tue Dec 6 21:07:40 2016 us=485800 PO_CTL rwflags=0x0001 ev=3 arg=0x006b8a88
Tue Dec 6 21:07:40 2016 us=485814 I/O WAIT TR|Tw|SR|Sw [1/171044]

... so: I have no idea why it's adding 0 to your fd sets. It obviously should not do that, but "it's not doing it for me".

(Looking at openvpn-2.4_rc1-r1.ebuild I can't see anything that would explain it either - the only patch is related to cmocka testing, and that does not touch actual openvpn code)

Can you re-run your most basic example with --verb 9 and attach the output (break with ctrl-c when it starts looping :) )? It might give some clue what is happening.

comment:2 Changed 3 years ago by marcan

This is using the portage ebuild, which as you mention doesn't really do anything weird other than apply that cmocka patch. However, a simple ./configure && make build of the tarball does not have the same symptoms, but building with the configure options that the ebuild uses does.

It seems the culprit configure option is --disable-async-push, which maps to the (negated) USE flag inotify. However, oddly enough, --enable-async-push also causes the problem; the option needs to be omitted entirely for things to work. So regardless of the USE setting, the package is broken on Gentoo, as it always uses one or the other flag.

Looks like the configure script is broken and passing either flag actually behaves like enabling it:

$ ./configure --enable-async-push &>/dev/null && grep ENABLE_ASYNC_PUSH config.h
#define ENABLE_ASYNC_PUSH 1
$ ./configure --disable-async-push &>/dev/null && grep ENABLE_ASYNC_PUSH config.h 
#define ENABLE_ASYNC_PUSH 1
$ ./configure &>/dev/null && grep ENABLE_ASYNC_PUSH config.h 
/* #undef ENABLE_ASYNC_PUSH */

Running with --verb 9 does mention fd 0:

Wed Dec  7 05:55:10 2016 us=854515 PO_CTL rwflags=0x0001 ev=0 arg=0x6ab622ef064

I'm attaching the full log.

Changed 3 years ago by marcan

Attachment: openvpn-loop.log added

comment:3 Changed 3 years ago by Gert Döring

Cc: stipa added

thanks, this makes sense - I'm not configuring with options, and none of our buildslaves would run the "</dev/null" test.

Someone (was that you?) already mentioned on the openvpn-devel list that the --xxx-async-push options for configure do not work as expected, and it seems there is more brokenness there :-(

We'll go and investigate.

Since I'm a bit busy today and tomorrow, I just take the liberty and cc: this to Lev to investigate :-)

commit 0d1a75bfe241466230c41a52c6013494135c5935
Author: Lev Stipakov <lstipakov@…>
Date: Sat Oct 10 19:04:25 2015 +0300

Send push reply right after async auth complete

comment:4 Changed 3 years ago by Gert Döring

+#ifdef ENABLE_ASYNC_PUSH
+ /* arm inotify watcher */
+ event_ctl (c->c2.event_set, c->c2.inotify_fd, EVENT_READ, (void*)&file_shift)
;
+#endif

this very much looks like "if no async push is pending, inotify_fd is 0, and should *not* be armed"... plus the configure mishap.

comment:5 Changed 3 years ago by Gert Döring

Milestone: release 2.4
Priority: majorblocker

comment:6 Changed 3 years ago by stipa

Thanks, we have 2 bugs here:

comment:7 Changed 3 years ago by David Sommerseth

Resolution: fixed
Status: newclosed

Both these issues should be fixed now.

commit 7084a3993fa35c6fb71abe8aac7b30f442205e2a
Author: Lev Stipakov <lstipakov@gmail.com>
Date:   Wed Dec 7 01:45:51 2016 +0200

    Arm inotify only in server mode
    
    Async-push is a server side feature and inotify_fd is
    initialized in server mode.
    
    Trac: #786
    Signed-off-by: Lev Stipakov <lstipakov@gmail.com>
    Acked-by: David Sommerseth <davids@openvpn.net>
    Message-Id: <1481067951-28917-1-git-send-email-lstipakov@gmail.com>
    URL: https://www.mail-archive.com/openvpn-devel@lists.sourceforge.net/msg13415.html
    Signed-off-by: David Sommerseth <davids@openvpn.net>

commit e62eccf025aa60ec268787d2aa4a46310ed1cd60
Author: David Sommerseth <davids@openvpn.net>
Date:   Tue Dec 6 23:10:51 2016 +0100

    Fix wrong configure.ac parsing of --enable-async-push
    
    AC_ARG_ENABLE() was used wrong, which led enable_async_push to
    always be set, regardless if --enable-async-push or --disable-async-push
    was used.
    
    Also spotted the exact same patch when writing this commit message as
    GitHub PR#70.
    
    Trac: #786
    Signed-off-by: David Sommerseth <davids@openvpn.net>
    Acked-by: Lev Stipakov <lstipakov@gmail.com>
    Message-Id: <1481062251-18349-1-git-send-email-davids@openvpn.net>
    URL: https://www.mail-archive.com/openvpn-devel@lists.sourceforge.net/msg13411.html

They will arrive in the v2.4_rc2 release.

Note: See TracTickets for help on using tickets.