Opened 2 years ago

Closed 2 years ago

Last modified 2 years ago

#1436 closed Bug / Defect (fixed)

master + OpenSSL 3.0.0 build: signal handling is broken

Reported by: Gert Döring Owned by:
Priority: critical Milestone: release 2.6
Component: Generic / unclassified Version: OpenVPN git master branch (Community Ed)
Severity: Not set (select this one, unless your'e a OpenVPN developer) Keywords:
Cc: plaisthos, MaxF

Description

Testing "master + OpenSSL 3.0.0", specifically d67658feeab4742b9b6f57806ba8e93c8eec75b8

Running server instance on CLI, noticing "ah, need different options", press ctrl-c. Nothing happens. Press ctrl-z, kill %1. Nothing happens.

2021-10-22 13:17:04 us=197577 Listening for incoming TCP connection on [AF_INET6][undef]:51204
^C^C^C^C
^C


^Z
[1]+  Stopped                 bin/openvpn --cd tap-tcp-p2p --config server.conf --tls-cipher "DEFAULT:@SECLEVEL=0"
gentoo ~/t_server # kill %1

[1]+  Stopped                 bin/openvpn --cd tap-tcp-p2p --config server.conf --tls-cipher "DEFAULT:@SECLEVEL=0"
gentoo ~/t_server # fg
bin/openvpn --cd tap-tcp-p2p --config server.conf --tls-cipher "DEFAULT:@SECLEVEL=0"

(this is on Gentoo, with OpenSSL 3.0.0)

"top" states that openvpn consumes 100% of one CPU core.

"strace" claims openvpn does nothing (= looping inside).

Running from within GDB is a bit more complicated (as GDB will trap keyboard interrupts) but killing -INT from outside, then continuieing in GDB and then ctrl-c/where leads to this:

Program received signal SIGTERM, Terminated.
0x00007ffff7929c96 in clock_nanosleep () from /lib64/libc.so.6
(gdb) cont
Continuing.

^C
Program received signal SIGINT, Interrupt.
0x00007ffff782d890 in pthread_rwlock_wrlock () from /lib64/libpthread.so.0
(gdb) where
#0  0x00007ffff782d890 in pthread_rwlock_wrlock () from /lib64/libpthread.so.0
#1  0x00007ffff7c52939 in CRYPTO_THREAD_write_lock () from /home/openssl-3.0.0/lib64/libcrypto.so.3
#2  0x00007ffff7c2095b in evp_keymgmt_util_clear_operation_cache () from /home/openssl-3.0.0/lib64/libcrypto.so.3
#3  0x00007ffff7c29a25 in EVP_PKEY_free () from /home/openssl-3.0.0/lib64/libcrypto.so.3
#4  0x00007ffff7d02ea1 in x509_pubkey_ex_free () from /home/openssl-3.0.0/lib64/libcrypto.so.3
#5  0x00007ffff7b0c921 in ossl_asn1_template_free () from /home/openssl-3.0.0/lib64/libcrypto.so.3
#6  0x00007ffff7b0c5e8 in ossl_asn1_item_embed_free () from /home/openssl-3.0.0/lib64/libcrypto.so.3
#7  0x00007ffff7b0c921 in ossl_asn1_template_free () from /home/openssl-3.0.0/lib64/libcrypto.so.3
#8  0x00007ffff7b0c5e8 in ossl_asn1_item_embed_free () from /home/openssl-3.0.0/lib64/libcrypto.so.3
#9  0x00007ffff7b0c845 in ASN1_item_free () from /home/openssl-3.0.0/lib64/libcrypto.so.3
#10 0x00007ffff7ec5bdc in ssl_cert_clear_certs () from /home/openssl-3.0.0/lib64/libssl.so.3
#11 0x00007ffff7ec5c89 in ssl_cert_free () from /home/openssl-3.0.0/lib64/libssl.so.3
#12 0x00007ffff7ed5f7a in SSL_CTX_free () from /home/openssl-3.0.0/lib64/libssl.so.3
#13 0x00005555555d4521 in tls_ctx_free (ctx=ctx@entry=0x7fffffffde38) at ssl_openssl.c:151
#14 0x0000555555576b68 in key_schedule_free (ks=ks@entry=0x7fffffffddc8, free_ssl_ctx=free_ssl_ctx@entry=true) at init.c:2586
#15 0x000055555557b3a0 in do_close_free_key_schedule (free_ssl_ctx=true, c=0x7fffffffd600) at init.c:3554
#16 close_instance (c=c@entry=0x7fffffffd600) at init.c:4397
#17 0x000055555557b898 in close_context (c=0x7fffffffd600, sig=<optimized out>, flags=4) at init.c:4590
#18 0x000055555557bd86 in init_instance (c=c@entry=0x7fffffffd600, env=env@entry=0x555555633330, flags=flags@entry=4) at init.c:4364
#19 0x000055555557cf31 in init_instance_handle_signals (c=0x7fffffffd600, env=0x555555633330, flags=4) at init.c:4047
#20 0x00005555555979d4 in tunnel_point_to_point (c=0x7fffffffd600) at openvpn.c:67
#21 openvpn_main (argc=7, argv=0x7fffffffe5d8) at openvpn.c:283

which smells "something in OpenSSL gets confused big time", but maybe it's us, feeding in some incorrect stuff.

Interesting enough, we seem to print our "signal received, terminating" message only after cleaning up the CTX stuff - so, nothing ever printed.

"cont"'ing after the trace above leads to 100% CPU again, still in CRYPTO_THREAD_write_lock() land.

Change History (5)

comment:1 Changed 2 years ago by Selva Nair

I have run both server and client numerous times with OpenSSL 3 master (3.1.0-ish) (never 3.0.0, though) as a part of testing my provider and have not seen this. I don't have any of the 'deprecation patches' applied, but that shouldn't matter, I guess.

Could you post the config used to trigger this?

BTW, one could send signal to the process itself from gdb using the gdb command signal: 'signal SIGTERM' etc.

comment:2 Changed 2 years ago by Selva Nair

2021-10-22 13:17:04 us=197577 Listening for incoming TCP connection on [AF_INET6][undef]:51204

Is that the end of the server log when ctr-C is attempted? Is it stuck somewhere for not progressing to "Initialization Sequence Completed"?

comment:3 in reply to:  2 Changed 2 years ago by Gert Döring

Cc: MaxF added

Replying to Selva Nair:

2021-10-22 13:17:04 us=197577 Listening for incoming TCP connection on [AF_INET6][undef]:51204

Is that the end of the server log when ctr-C is attempted? Is it stuck somewhere for not progressing to "Initialization Sequence Completed"?

I ctrl-c'ed before the first incoming connection, and it's a "--mode tls-server" setup, not "--server". So that part is fine.

If i increase verb to 9, I get

2021-10-25 17:54:10 us=874270 setsockopt(IPV6_V6ONLY=0)
2021-10-25 17:54:10 us=874292 Listening for incoming TCP connection on [AF_INET6][undef]:51204
^C2021-10-25 17:54:12 us=327728 PID packet_id_free
2021-10-25 17:54:12 us=327830 PID packet_id_free
2021-10-25 17:54:12 us=327855 PID packet_id_free
2021-10-25 17:54:12 us=327867 PID packet_id_free
2021-10-25 17:54:12 us=327901 PID packet_id_free
2021-10-25 17:54:12 us=327918 PID packet_id_free
2021-10-25 17:54:12 us=327932 PID packet_id_free
2021-10-25 17:54:12 us=327944 PID packet_id_free

It might be related to the erroneous free() that MaxF just posted a patch to...

... testing...

And lo and behold :-)

2021-10-25 17:55:34 us=497127 setsockopt(IPV6_V6ONLY=0)
2021-10-25 17:55:34 us=497174 Listening for incoming TCP connection on [AF_INET6][undef]:51204
^C2021-10-25 17:55:35 us=777885 PID packet_id_free
2021-10-25 17:55:35 us=777988 PID packet_id_free
2021-10-25 17:55:35 us=778015 PID packet_id_free
2021-10-25 17:55:35 us=778043 PID packet_id_free
2021-10-25 17:55:35 us=778090 PID packet_id_free
2021-10-25 17:55:35 us=778114 PID packet_id_free
2021-10-25 17:55:35 us=778126 PID packet_id_free
2021-10-25 17:55:35 us=778152 PID packet_id_free
2021-10-25 17:55:35 us=778281 TCP/UDP: Closing socket
2021-10-25 17:55:35 us=778347 Closing TUN/TAP interface
2021-10-25 17:55:35 us=778374 net_addr_v4_del: 10.204.9.1 dev tap99
2021-10-25 17:55:35 us=778511 sitnl_send: checking for received messages
2021-10-25 17:55:35 us=778545 sitnl_send: rtnl: received 36 bytes
2021-10-25 17:55:35 us=778569 net_addr_v6_del: fd00:abcd:204:9::1/64 dev tap99
2021-10-25 17:55:35 us=778669 sitnl_send: checking for received messages
2021-10-25 17:55:35 us=778703 sitnl_send: rtnl: received 36 bytes
2021-10-25 17:55:35 us=787500 PID packet_id_free
2021-10-25 17:55:35 us=787549 SIGINT[hard,init_instance] received, process exiting

So "just a simple double free", then...

Version 0, edited 2 years ago by Gert Döring (next)

comment:4 Changed 2 years ago by Gert Döring

Resolution: fixed
Status: newclosed

Closing this, as the "obvious" way I found how to break it is fixed. So it wasn't "signal handling" but "memory handling" after all...

I might revisit if I find new ways to break this :-)

comment:5 Changed 2 years ago by Selva Nair

Yeah, that dh patch was buggy and triggered only if dh file is in use, I guess.

Note: See TracTickets for help on using tickets.