Opened 9 years ago
Last modified 15 months ago
#543 new Bug / Defect
Packets are sent in wrong order via TCP breaking HMAC verificytion
Reported by: | cyslider | Owned by: | |
---|---|---|---|
Priority: | major | Milestone: | |
Component: | Networking | Version: | OpenVPN 2.3.6 (Community Ed) |
Severity: | Not set (select this one, unless your'e a OpenVPN developer) | Keywords: | packets tls_auth hmac pid |
Cc: | Steffan Karger, tct, plaisthos |
Description
When negotiating the connection, the client sends packets in this order:
15, 17, 16, 17, 18
Without tls_auth the server just warns with "ACK 17 is a replay: [18]" but else it goes unnoticed, but with tls_auth it will lead to: "Authenticate/Decrypt? packet error: packet HMAC authentication failed"
Detailed client logs can be found here: http://pastebin.com/6xJqBANk
This happens under high latency SAT Link with tls_auth and TCP
More details, including configs can be found here: http://pastebin.com/7m7vkKnN
This bug exists at least since 2.0.9
Currently what I think happens is, that after getting two ACKS in a row, he jumps over packet 16 and sends 17 early. Then he recalculates the next packet and ends up with 16 and is back in order again. Since the ACK of 17 is comeing late, 17 is being sent twice even.
Partial log:
Apr 11 12:14:24 client openvpn[10121]: TCPv4_CLIENT WRITE [114] to [AF_INET]xxx.xxx.xxx.xxx:yyyy: P_CONTROL_V1 kid=0 [ ] pid=15 DATA len=100 Apr 11 12:14:24 client openvpn[10121]: ACK reliable_can_send active=4 current=0 : [16] 14 15 12 13 Apr 11 12:14:24 client openvpn[10121]: ACK output sequence broken: [16] 14 15 12 13 ... Apr 11 12:14:25 client openvpn[10121]: TCPv4_CLIENT READ [22] from [AF_INET]xxx.xxx.xxx.xxx:yyyy: P_ACK_V1 kid=0 [ 12 ] ... Apr 11 12:14:25 client openvpn[10121]: ACK reliable_can_send active=3 current=1 : [16] 14 15 13 Apr 11 12:14:25 client openvpn[10121]: ACK reliable_send ID 13 (size=104 to=4) ... Apr 11 12:14:25 client openvpn[10121]: TCPv4_CLIENT READ [22] from [AF_INET]xxx.xxx.xxx.xxx:yyyy: P_ACK_V1 kid=0 [ 13 ] Apr 11 12:14:25 client openvpn[10121]: ACK received for pid 13, deleting from send buffer Apr 11 12:14:25 client openvpn[10121]: BIO read tls_read_ciphertext 100 bytes Apr 11 12:14:25 client openvpn[10121]: ACK mark active outgoing ID 16 Apr 11 12:14:25 client openvpn[10121]: BIO read tls_read_ciphertext 100 bytes Apr 11 12:14:25 client openvpn[10121]: ACK mark active outgoing ID 17 Apr 11 12:14:25 client openvpn[10121]: ACK output sequence broken: [18] 14 15 16 17 Apr 11 12:14:25 client openvpn[10121]: ACK reliable_send_timeout 0 [18] 14 15 16 17 ... Apr 11 12:14:25 client openvpn[10121]: TCPv4_CLIENT WRITE [114] to [AF_INET]xxx.xxx.xxx.xxx:yyyy: P_CONTROL_V1 kid=0 [ ] pid=17 DATA len=100 Apr 11 12:14:25 client openvpn[10121]: ACK reliable_can_send active=4 current=2 : [18] 14 15 16 17 Apr 11 12:14:25 client openvpn[10121]: ACK reliable_send ID 16 (size=104 to=3) ... Apr 11 12:14:25 client openvpn[10121]: TCPv4_CLIENT WRITE [114] to [AF_INET]xxx.xxx.xxx.xxx:yyyy: P_CONTROL_V1 kid=0 [ ] pid=16 DATA len=100 Apr 11 12:14:25 client openvpn[10121]: ACK reliable_can_send active=4 current=1 : [18] 14 15 16 17 Apr 11 12:14:25 client openvpn[10121]: ACK reliable_send ID 17 (size=104 to=4) ... Apr 11 12:14:25 client openvpn[10121]: TCPv4_CLIENT WRITE [114] to [AF_INET]xxx.xxx.xxx.xxx:yyyy: P_CONTROL_V1 kid=0 [ ] pid=17 DATA len=100 Apr 11 12:14:25 client openvpn[10121]: ACK reliable_can_send active=4 current=0 : [18] 14 15 16 17 Apr 11 12:14:25 client openvpn[10121]: ACK output sequence broken: [18] 14 15 16 17 ... Apr 11 12:14:25 client openvpn[10121]: TCPv4_CLIENT READ [22] from [AF_INET]xxx.xxx.xxx.xxx:yyyy: P_ACK_V1 kid=0 [ 14 ] ... Apr 11 12:14:25 client openvpn[10121]: TCPv4_CLIENT WRITE [114] to [AF_INET]xxx.xxx.xxx.xxx:yyyy: P_CONTROL_V1 kid=0 [ ] pid=18 DATA len=100 ... Apr 11 12:14:26 client openvpn[10121]: TCPv4_CLIENT READ [22] from [AF_INET]xxx.xxx.xxx.xxx:yyyy: P_ACK_V1 kid=0 [ 15 ] ... Apr 11 12:14:26 client openvpn[10121]: TCPv4_CLIENT WRITE [114] to [AF_INET]xxx.xxx.xxx.xxx:yyyy: P_CONTROL_V1 kid=0 [ ] pid=19 DATA len=100 ... Apr 11 12:14:26 client openvpn[10121]: TCPv4_CLIENT READ [22] from [AF_INET]xxx.xxx.xxx.xxx:yyyy: P_ACK_V1 kid=0 [ 17 ] Apr 11 12:14:26 client openvpn[10121]: ACK received for pid 17, deleting from send buffer Apr 11 12:14:26 client openvpn[10121]: ACK reliable_can_send active=3 current=0 : [20] 18 19 16 Apr 11 12:14:26 client openvpn[10121]: ACK output sequence broken: [20] 18 19 16 Apr 11 12:14:26 client openvpn[10121]: ACK reliable_send_timeout 2 [20] 18 19 16
Change History (10)
comment:1 Changed 9 years ago by
Component: | Generic / unclassified → Networking |
---|---|
Milestone: | → release 2.3.8 |
comment:2 Changed 9 years ago by
Ok, I think I found the bug. It seems to be a racing condition.
After not receiving ACK for packet 13, he decides to send it again.
Apr 11 12:14:25 client openvpn[10121]: ACK reliable_can_send active=3 current=1 : [16] 14 15 13 Apr 11 12:14:25 client openvpn[10121]: ACK reliable_send ID 13 (size=104 to=4)
But right after that he gets the ACK, and removes the package from the reliable struct.
Apr 11 12:14:25 client openvpn[10121]: TCPv4_CLIENT READ [22] from [AF_INET]xxx.xxx.xxx.xxx:yyyy: P_ACK_V1 kid=0 [ 13 ] Apr 11 12:14:25 client openvpn[10121]: ACK received for pid 13, deleting from send buffer
Then package 16 & 17 get added
Apr 11 12:14:25 client openvpn[10121]: ACK mark active outgoing ID 16 Apr 11 12:14:25 client openvpn[10121]: ACK mark active outgoing ID 17
and instead of 13 he now sends 17
Apr 11 12:14:25 client openvpn[10121]: TCPv4_CLIENT WRITE [114] to [AF_INET]xxx.xxx.xxx.xxx:yyyy: P_CONTROL_V1 kid=0 [ ] pid=17 DATA len=100
comment:3 Changed 9 years ago by
I can confirm that by setting the tls_timeout parameter to 4 seconds, this problem is gone for our SAT Link connection.
I believe now that there was some kind of traffic shaping involved delaying the ACKs by exactly 2 seconds triggering this bug so often for this particular connection.
My easy way out would be to introduce a "send" boolean to the reliable_entry struct and set it to true in the reliable_send method (reliable.c)
Then before sending the package, test if "send" is still true. Else it was swapped. Better would be however to make sure it can not be swapped in the first place.
comment:4 Changed 9 years ago by
Cc: | Steffan Karger added |
---|---|
Milestone: | release 2.3.8 → release 2.3.9 |
Bumping to milestone 2.3.9, copying in syzzer who might or might not have an idea (or time to get curious)
comment:5 Changed 8 years ago by
Milestone: | release 2.3.9 → release 2.3.11 |
---|
comment:6 Changed 8 years ago by
Milestone: | release 2.3.11 → release 2.3.12 |
---|
comment:7 Changed 7 years ago by
Milestone: | release 2.3.12 → release 2.3.15 |
---|
slightly embarrassing, but not going to happen for 2.3.14
comment:8 Changed 4 years ago by
Cc: | tct added |
---|---|
Milestone: | release 2.3.15 → release 2.5 |
comment:9 Changed 3 years ago by
Milestone: | release 2.5 → release 2.5.3 |
---|
comment:10 Changed 15 months ago by
Cc: | plaisthos added |
---|---|
Milestone: | release 2.5.3 |
Removing the milestone (as 2.5.3 has been long shipped).
I think this is fixed, as a side effect of IV_TCPNL, accepting out-of-order control channel packets now in general
commit 55755e6ee56516c96525e6bf313c173653af1a4b
Author: Arne Schwabe <arne@…>
Date: Sat Sep 17 16:15:38 2016 +0200
Enable TCP non-linear packet ID
Implementation with multiple threads needs that to be able run encryption
in parallel. Tested with James' OpenVPN 3 server.
(so we might still *send* them in the wrong order - even though the reliable/ACK handling has been overhauled for 2.6.0 as well - but even if we do, we accept them on reception in whatever order they come in, subject to replay_window handling)
As mentioned on IRC - this is an old old bug, and quite likely needs "high RTT" (SAT link), "TCP" and "tls-auth" to be triggered.
Still, it's a bug and should be fixed. No promises on time frame (*sigh*).