Opened 2 months ago

Last modified 2 months ago

#1390 new Bug / Defect

mlock without raising lock limit may result in DoS for the entire system

Reported by: zugschlus Owned by:
Priority: minor Milestone:
Component: Generic / unclassified Version: OpenVPN 2.5.0 (Community Ed)
Severity: Not set (select this one, unless your'e a OpenVPN developer) Keywords:
Cc:

Description

Hi,

when mlock is enabled without appropropriately raising the Memlock limit, the OpenVPN daemon might encounter an out of memory situation. This is documented and expected.

However, the daemon can report this situation up to (but probably not limited to) 13000 times a second, at the emergency severity. With most syslog daemons being configured to send emergency messages to everybody logged in, this situation is impossible to fix. A simple user error thus results in a DoS that can only be remedied by a hard shutdown (I experienced this on an embedded system that didn't even respond to MagicSysRq? even tough it was supposed to).

I know this is a clear user error, and I am not even sure whether it was OpenVPN itself or some other part of the system that went havoc in the situation. On the OpenVPN side, would it be possible to rate limit the out of memory message? Additionally, I suspect that after a failed malloc, the daemon's state is pretty much toast anyway, so it would probably not hurt to just terminate?

Thanks to the joy of automation, the faulty configuration was rolled out to all my OpenVPN clients and can therefore confirm that the issue happens on x86_64 and the ARM architecture, and both with OpenVPN 2.5.1, the version in Debian unstable, as well as 2.4.7 on Debian stable.

Thanks for your consideration.

Greetings
Marc

Log Entries:

Mar 6 16:13:34 drop ovpn-zg2-client[3196285]: TLS: soft reset sec=3600/3600 bytes=420861444/-1 pkts=534573/0
Mar 6 16:13:34 drop ovpn-zg2-client[3196285]: OpenSSL: error:07064041:memory buffer routines:BUF_MEM_grow:malloc failure
Mar 6 16:13:34 drop ovpn-zg2-client[3196285]: OpenSSL: error:14161044:SSL routines:state_machine:internal error
Mar 6 16:13:34 drop ovpn-zg2-client[3196285]: TLS_ERROR: BIO read tls_read_plaintext error
Mar 6 16:13:34 drop ovpn-zg2-client[3196285]: TLS Error: TLS object -> incoming plaintext read error
Mar 6 16:13:34 drop ovpn-zg2-client[3196285]: TLS Error: TLS handshake failed
Mar 6 16:13:34 drop ovpn-zg2-client[3196285]: TLS: move_session: dest=TM_LAME_DUCK src=TM_ACTIVE reinit_src=1
Mar 6 16:13:34 drop t of memory [3196285]
Mar 6 16:13:34 drop t of memory [3196285]
Mar 6 16:13:34 drop t of memory [3196285]
Mar 6 16:13:34 drop t of memory [3196285]
Mar 6 16:13:34 drop t of memory [3196285]
(many thousands more not copied).

Change History (9)

comment:1 Changed 2 months ago by Gert Döring

We *should* report "out of memory" (with "ou"!) only once, and then abort cleanly.

This looks like we hit the OOM in the "clean shutdown" handler, and then end up stuck in a loop. This is bad.

Maybe we need to introduce some extra flag "we are in emergency shutdown already" so if *another* OOM happens, we just do a sleep(5); exit(99); (the sleep to avoid your init system of choice to do an immediate restart and "more log lines" repetition)

comment:2 Changed 2 months ago by Gert Döring

maybe check for the maximum amount of available "mlockable" memory (getrlimit(RLIMIT_MLOCK)) and if it's less than 50M, refuse --mlock...

Need to understand this better.

comment:3 Changed 2 months ago by Gert Döring

[Service] LimitMEMLOCK=100M per systemd
equivalent to "ulimit -l"

comment:4 Changed 2 months ago by Gert Döring

I'm not sure how to fix the OOM-logs-like-hell loop, but the "check if we have enough mlock()-able memory" part is actually fairly easy.

Patch is out for review

https://patchwork.openvpn.net/patch/1608/
https://www.mail-archive.com/openvpn-devel@lists.sourceforge.net/msg21615.html

comment:5 Changed 2 months ago by tincantech

Also:
https://community.openvpn.net/openvpn/ticket/1059

FTR, I remember that 89MB was required to load EC cert+key

comment:6 Changed 2 months ago by Gert Döring

New v2 patch with "... and use setrlimit() to increase the limit to 100M"

https://patchwork.openvpn.net/patch/1609/

comment:7 Changed 2 months ago by Gert Döring

Patch breaks OpenSolaris? (no RLIMIT_MEMLOCKED). Grrr. v3...

https://patchwork.openvpn.net/patch/1614/

comment:8 Changed 2 months ago by Gert Döring

Now that is... interesting. Our OOM handler does this

/*
 * Fail memory allocation.  Don't use msg() because it tries
 * to allocate memory as part of its operation.
 */
void
out_of_memory(void)
{
    fprintf(stderr, PACKAGE_NAME ": Out of Memory\n");
    exit(1);
}

and most memory-allocating things call

check_malloc_return(const void *p)
{
    if (!p)
    {
        out_of_memory();
    }
}

so I do wonder where this message came from. The only hit for "t of memory" in lowercase in our sources is in cryptoapi.c, which is windows only.

@Zugschlus: was this logged to stdout -> systemd -> journal -> syslog, or did openvpn actively log to syslog? It feels like this might have come out of the crypto library, in which case we really can't do anything about it (except "bump up the memory limits").

comment:9 Changed 2 months ago by Gert Döring

Patch has been applied to the master branch.

commit b0bff5590152fbcc5b4d47c18817838fd00c58c3
Author: Gert Doering
Date: Wed Mar 10 13:48:08 2021 +0100

Require at least 100MB of mlock()-able memory if --mlock is used.

... and I think that's the best we can do here. If my interpretation is right, the error message above (and the loop) comes from the bowels of the crypto library, and we can't influence their error handling.

So - 2.6 should not have this problem anymore.

For 2.5 (and earlier), at least we have improved documentation :-)

Note: See TracTickets for help on using tickets.