#1436 closed Bug / Defect (fixed)
master + OpenSSL 3.0.0 build: signal handling is broken
Reported by: | Gert Döring | Owned by: | |
---|---|---|---|
Priority: | critical | Milestone: | release 2.6 |
Component: | Generic / unclassified | Version: | OpenVPN git master branch (Community Ed) |
Severity: | Not set (select this one, unless your'e a OpenVPN developer) | Keywords: | |
Cc: | plaisthos, MaxF |
Description
Testing "master + OpenSSL 3.0.0", specifically d67658feeab4742b9b6f57806ba8e93c8eec75b8
Running server instance on CLI, noticing "ah, need different options", press ctrl-c. Nothing happens. Press ctrl-z, kill %1. Nothing happens.
2021-10-22 13:17:04 us=197577 Listening for incoming TCP connection on [AF_INET6][undef]:51204 ^C^C^C^C ^C ^Z [1]+ Stopped bin/openvpn --cd tap-tcp-p2p --config server.conf --tls-cipher "DEFAULT:@SECLEVEL=0" gentoo ~/t_server # kill %1 [1]+ Stopped bin/openvpn --cd tap-tcp-p2p --config server.conf --tls-cipher "DEFAULT:@SECLEVEL=0" gentoo ~/t_server # fg bin/openvpn --cd tap-tcp-p2p --config server.conf --tls-cipher "DEFAULT:@SECLEVEL=0"
(this is on Gentoo, with OpenSSL 3.0.0)
"top" states that openvpn consumes 100% of one CPU core.
"strace" claims openvpn does nothing (= looping inside).
Running from within GDB is a bit more complicated (as GDB will trap keyboard interrupts) but killing -INT from outside, then continuieing in GDB and then ctrl-c/where leads to this:
Program received signal SIGTERM, Terminated. 0x00007ffff7929c96 in clock_nanosleep () from /lib64/libc.so.6 (gdb) cont Continuing. ^C Program received signal SIGINT, Interrupt. 0x00007ffff782d890 in pthread_rwlock_wrlock () from /lib64/libpthread.so.0 (gdb) where #0 0x00007ffff782d890 in pthread_rwlock_wrlock () from /lib64/libpthread.so.0 #1 0x00007ffff7c52939 in CRYPTO_THREAD_write_lock () from /home/openssl-3.0.0/lib64/libcrypto.so.3 #2 0x00007ffff7c2095b in evp_keymgmt_util_clear_operation_cache () from /home/openssl-3.0.0/lib64/libcrypto.so.3 #3 0x00007ffff7c29a25 in EVP_PKEY_free () from /home/openssl-3.0.0/lib64/libcrypto.so.3 #4 0x00007ffff7d02ea1 in x509_pubkey_ex_free () from /home/openssl-3.0.0/lib64/libcrypto.so.3 #5 0x00007ffff7b0c921 in ossl_asn1_template_free () from /home/openssl-3.0.0/lib64/libcrypto.so.3 #6 0x00007ffff7b0c5e8 in ossl_asn1_item_embed_free () from /home/openssl-3.0.0/lib64/libcrypto.so.3 #7 0x00007ffff7b0c921 in ossl_asn1_template_free () from /home/openssl-3.0.0/lib64/libcrypto.so.3 #8 0x00007ffff7b0c5e8 in ossl_asn1_item_embed_free () from /home/openssl-3.0.0/lib64/libcrypto.so.3 #9 0x00007ffff7b0c845 in ASN1_item_free () from /home/openssl-3.0.0/lib64/libcrypto.so.3 #10 0x00007ffff7ec5bdc in ssl_cert_clear_certs () from /home/openssl-3.0.0/lib64/libssl.so.3 #11 0x00007ffff7ec5c89 in ssl_cert_free () from /home/openssl-3.0.0/lib64/libssl.so.3 #12 0x00007ffff7ed5f7a in SSL_CTX_free () from /home/openssl-3.0.0/lib64/libssl.so.3 #13 0x00005555555d4521 in tls_ctx_free (ctx=ctx@entry=0x7fffffffde38) at ssl_openssl.c:151 #14 0x0000555555576b68 in key_schedule_free (ks=ks@entry=0x7fffffffddc8, free_ssl_ctx=free_ssl_ctx@entry=true) at init.c:2586 #15 0x000055555557b3a0 in do_close_free_key_schedule (free_ssl_ctx=true, c=0x7fffffffd600) at init.c:3554 #16 close_instance (c=c@entry=0x7fffffffd600) at init.c:4397 #17 0x000055555557b898 in close_context (c=0x7fffffffd600, sig=<optimized out>, flags=4) at init.c:4590 #18 0x000055555557bd86 in init_instance (c=c@entry=0x7fffffffd600, env=env@entry=0x555555633330, flags=flags@entry=4) at init.c:4364 #19 0x000055555557cf31 in init_instance_handle_signals (c=0x7fffffffd600, env=0x555555633330, flags=4) at init.c:4047 #20 0x00005555555979d4 in tunnel_point_to_point (c=0x7fffffffd600) at openvpn.c:67 #21 openvpn_main (argc=7, argv=0x7fffffffe5d8) at openvpn.c:283
which smells "something in OpenSSL gets confused big time", but maybe it's us, feeding in some incorrect stuff.
Interesting enough, we seem to print our "signal received, terminating" message only after cleaning up the CTX stuff - so, nothing ever printed.
"cont"'ing after the trace above leads to 100% CPU again, still in CRYPTO_THREAD_write_lock()
land.
Change History (5)
comment:1 Changed 3 years ago by
comment:2 follow-up: 3 Changed 3 years ago by
2021-10-22 13:17:04 us=197577 Listening for incoming TCP connection on [AF_INET6][undef]:51204
Is that the end of the server log when ctr-C is attempted? Is it stuck somewhere for not progressing to "Initialization Sequence Completed"?
comment:3 Changed 3 years ago by
Cc: | MaxF added |
---|
Replying to Selva Nair:
2021-10-22 13:17:04 us=197577 Listening for incoming TCP connection on [AF_INET6][undef]:51204
Is that the end of the server log when ctr-C is attempted? Is it stuck somewhere for not progressing to "Initialization Sequence Completed"?
I ctrl-c'ed before the first incoming connection, and it's a "--mode tls-server" setup, not "--server". So that part is fine.
If i increase verb to 9, I get
2021-10-25 17:54:10 us=874270 setsockopt(IPV6_V6ONLY=0) 2021-10-25 17:54:10 us=874292 Listening for incoming TCP connection on [AF_INET6][undef]:51204 ^C2021-10-25 17:54:12 us=327728 PID packet_id_free 2021-10-25 17:54:12 us=327830 PID packet_id_free 2021-10-25 17:54:12 us=327855 PID packet_id_free 2021-10-25 17:54:12 us=327867 PID packet_id_free 2021-10-25 17:54:12 us=327901 PID packet_id_free 2021-10-25 17:54:12 us=327918 PID packet_id_free 2021-10-25 17:54:12 us=327932 PID packet_id_free 2021-10-25 17:54:12 us=327944 PID packet_id_free
It might be related to the erroneous free() that MaxF just posted a patch to...
... testing...
And lo and behold :-)
2021-10-25 17:55:34 us=497127 setsockopt(IPV6_V6ONLY=0) 2021-10-25 17:55:34 us=497174 Listening for incoming TCP connection on [AF_INET6][undef]:51204 ^C2021-10-25 17:55:35 us=777885 PID packet_id_free 2021-10-25 17:55:35 us=777988 PID packet_id_free 2021-10-25 17:55:35 us=778015 PID packet_id_free 2021-10-25 17:55:35 us=778043 PID packet_id_free 2021-10-25 17:55:35 us=778090 PID packet_id_free 2021-10-25 17:55:35 us=778114 PID packet_id_free 2021-10-25 17:55:35 us=778126 PID packet_id_free 2021-10-25 17:55:35 us=778152 PID packet_id_free 2021-10-25 17:55:35 us=778281 TCP/UDP: Closing socket 2021-10-25 17:55:35 us=778347 Closing TUN/TAP interface 2021-10-25 17:55:35 us=778374 net_addr_v4_del: 10.204.9.1 dev tap99 2021-10-25 17:55:35 us=778511 sitnl_send: checking for received messages 2021-10-25 17:55:35 us=778545 sitnl_send: rtnl: received 36 bytes 2021-10-25 17:55:35 us=778569 net_addr_v6_del: fd00:abcd:204:9::1/64 dev tap99 2021-10-25 17:55:35 us=778669 sitnl_send: checking for received messages 2021-10-25 17:55:35 us=778703 sitnl_send: rtnl: received 36 bytes 2021-10-25 17:55:35 us=787500 PID packet_id_free 2021-10-25 17:55:35 us=787549 SIGINT[hard,init_instance] received, process exiting
So "just a simple double free", then...
comment:4 Changed 3 years ago by
Resolution: | → fixed |
---|---|
Status: | new → closed |
Closing this, as the "obvious" way I found how to break it is fixed. So it wasn't "signal handling" but "memory handling" after all...
I might revisit if I find new ways to break this :-)
comment:5 Changed 3 years ago by
Yeah, that dh patch was buggy and triggered only if dh file is in use, I guess.
I have run both server and client numerous times with OpenSSL 3 master (3.1.0-ish) (never 3.0.0, though) as a part of testing my provider and have not seen this. I don't have any of the 'deprecation patches' applied, but that shouldn't matter, I guess.
Could you post the config used to trigger this?
BTW, one could send signal to the process itself from gdb using the gdb command signal: 'signal SIGTERM' etc.