Opened 5 years ago

Last modified 7 months ago

#543 new Bug / Defect

Packets are sent in wrong order via TCP breaking HMAC verificytion

Reported by: cyslider Owned by:
Priority: major Milestone: release 2.5
Component: Networking Version: OpenVPN 2.3.6 (Community Ed)
Severity: Not set (select this one, unless your'e a OpenVPN developer) Keywords: packets tls_auth hmac pid
Cc: Steffan Karger, tincantech

Description

When negotiating the connection, the client sends packets in this order:

15, 17, 16, 17, 18

Without tls_auth the server just warns with "ACK 17 is a replay: [18]" but else it goes unnoticed, but with tls_auth it will lead to: "Authenticate/Decrypt? packet error: packet HMAC authentication failed"

Detailed client logs can be found here: http://pastebin.com/6xJqBANk

This happens under high latency SAT Link with tls_auth and TCP

More details, including configs can be found here: http://pastebin.com/7m7vkKnN

This bug exists at least since 2.0.9

Currently what I think happens is, that after getting two ACKS in a row, he jumps over packet 16 and sends 17 early. Then he recalculates the next packet and ends up with 16 and is back in order again. Since the ACK of 17 is comeing late, 17 is being sent twice even.

Partial log:

Apr 11 12:14:24 client openvpn[10121]: TCPv4_CLIENT WRITE [114] to [AF_INET]xxx.xxx.xxx.xxx:yyyy: P_CONTROL_V1 kid=0 [ ] pid=15 DATA len=100
Apr 11 12:14:24 client openvpn[10121]: ACK reliable_can_send active=4 current=0 : [16] 14 15 12 13
Apr 11 12:14:24 client openvpn[10121]: ACK output sequence broken: [16] 14 15 12 13
...
Apr 11 12:14:25 client openvpn[10121]: TCPv4_CLIENT READ [22] from [AF_INET]xxx.xxx.xxx.xxx:yyyy: P_ACK_V1 kid=0 [ 12 ]
...
Apr 11 12:14:25 client openvpn[10121]: ACK reliable_can_send active=3 current=1 : [16] 14 15 13
Apr 11 12:14:25 client openvpn[10121]: ACK reliable_send ID 13 (size=104 to=4)
...
Apr 11 12:14:25 client openvpn[10121]: TCPv4_CLIENT READ [22] from [AF_INET]xxx.xxx.xxx.xxx:yyyy: P_ACK_V1 kid=0 [ 13 ]
Apr 11 12:14:25 client openvpn[10121]: ACK received for pid 13, deleting from send buffer
Apr 11 12:14:25 client openvpn[10121]: BIO read tls_read_ciphertext 100 bytes
Apr 11 12:14:25 client openvpn[10121]: ACK mark active outgoing ID 16
Apr 11 12:14:25 client openvpn[10121]: BIO read tls_read_ciphertext 100 bytes
Apr 11 12:14:25 client openvpn[10121]: ACK mark active outgoing ID 17
Apr 11 12:14:25 client openvpn[10121]: ACK output sequence broken: [18] 14 15 16 17
Apr 11 12:14:25 client openvpn[10121]: ACK reliable_send_timeout 0 [18] 14 15 16 17
...
Apr 11 12:14:25 client openvpn[10121]: TCPv4_CLIENT WRITE [114] to [AF_INET]xxx.xxx.xxx.xxx:yyyy: P_CONTROL_V1 kid=0 [ ] pid=17 DATA len=100
Apr 11 12:14:25 client openvpn[10121]: ACK reliable_can_send active=4 current=2 : [18] 14 15 16 17
Apr 11 12:14:25 client openvpn[10121]: ACK reliable_send ID 16 (size=104 to=3)
...
Apr 11 12:14:25 client openvpn[10121]: TCPv4_CLIENT WRITE [114] to [AF_INET]xxx.xxx.xxx.xxx:yyyy: P_CONTROL_V1 kid=0 [ ] pid=16 DATA len=100
Apr 11 12:14:25 client openvpn[10121]: ACK reliable_can_send active=4 current=1 : [18] 14 15 16 17
Apr 11 12:14:25 client openvpn[10121]: ACK reliable_send ID 17 (size=104 to=4)
...
Apr 11 12:14:25 client openvpn[10121]: TCPv4_CLIENT WRITE [114] to [AF_INET]xxx.xxx.xxx.xxx:yyyy: P_CONTROL_V1 kid=0 [ ] pid=17 DATA len=100
Apr 11 12:14:25 client openvpn[10121]: ACK reliable_can_send active=4 current=0 : [18] 14 15 16 17
Apr 11 12:14:25 client openvpn[10121]: ACK output sequence broken: [18] 14 15 16 17
...
Apr 11 12:14:25 client openvpn[10121]: TCPv4_CLIENT READ [22] from [AF_INET]xxx.xxx.xxx.xxx:yyyy: P_ACK_V1 kid=0 [ 14 ]
...
Apr 11 12:14:25 client openvpn[10121]: TCPv4_CLIENT WRITE [114] to [AF_INET]xxx.xxx.xxx.xxx:yyyy: P_CONTROL_V1 kid=0 [ ] pid=18 DATA len=100
...
Apr 11 12:14:26 client openvpn[10121]: TCPv4_CLIENT READ [22] from [AF_INET]xxx.xxx.xxx.xxx:yyyy: P_ACK_V1 kid=0 [ 15 ]
...
Apr 11 12:14:26 client openvpn[10121]: TCPv4_CLIENT WRITE [114] to [AF_INET]xxx.xxx.xxx.xxx:yyyy: P_CONTROL_V1 kid=0 [ ] pid=19 DATA len=100
...
Apr 11 12:14:26 client openvpn[10121]: TCPv4_CLIENT READ [22] from [AF_INET]xxx.xxx.xxx.xxx:yyyy: P_ACK_V1 kid=0 [ 17 ]
Apr 11 12:14:26 client openvpn[10121]: ACK received for pid 17, deleting from send buffer
Apr 11 12:14:26 client openvpn[10121]: ACK reliable_can_send active=3 current=0 : [20] 18 19 16
Apr 11 12:14:26 client openvpn[10121]: ACK output sequence broken: [20] 18 19 16
Apr 11 12:14:26 client openvpn[10121]: ACK reliable_send_timeout 2 [20] 18 19 16

Change History (8)

comment:1 Changed 5 years ago by Gert Döring

Component: Generic / unclassifiedNetworking
Milestone: release 2.3.8

As mentioned on IRC - this is an old old bug, and quite likely needs "high RTT" (SAT link), "TCP" and "tls-auth" to be triggered.

Still, it's a bug and should be fixed. No promises on time frame (*sigh*).

comment:2 Changed 5 years ago by cyslider

Ok, I think I found the bug. It seems to be a racing condition.

After not receiving ACK for packet 13, he decides to send it again.

Apr 11 12:14:25 client openvpn[10121]: ACK reliable_can_send active=3 current=1 : [16] 14 15 13
Apr 11 12:14:25 client openvpn[10121]: ACK reliable_send ID 13 (size=104 to=4)

But right after that he gets the ACK, and removes the package from the reliable struct.

Apr 11 12:14:25 client openvpn[10121]: TCPv4_CLIENT READ [22] from [AF_INET]xxx.xxx.xxx.xxx:yyyy: P_ACK_V1 kid=0 [ 13 ]
Apr 11 12:14:25 client openvpn[10121]: ACK received for pid 13, deleting from send buffer

Then package 16 & 17 get added

Apr 11 12:14:25 client openvpn[10121]: ACK mark active outgoing ID 16
Apr 11 12:14:25 client openvpn[10121]: ACK mark active outgoing ID 17

and instead of 13 he now sends 17

Apr 11 12:14:25 client openvpn[10121]: TCPv4_CLIENT WRITE [114] to [AF_INET]xxx.xxx.xxx.xxx:yyyy: P_CONTROL_V1 kid=0 [ ] pid=17 DATA len=100

comment:3 Changed 5 years ago by cyslider

I can confirm that by setting the tls_timeout parameter to 4 seconds, this problem is gone for our SAT Link connection.

I believe now that there was some kind of traffic shaping involved delaying the ACKs by exactly 2 seconds triggering this bug so often for this particular connection.

My easy way out would be to introduce a "send" boolean to the reliable_entry struct and set it to true in the reliable_send method (reliable.c)

Then before sending the package, test if "send" is still true. Else it was swapped. Better would be however to make sure it can not be swapped in the first place.

comment:4 Changed 5 years ago by Gert Döring

Cc: Steffan Karger added
Milestone: release 2.3.8release 2.3.9

Bumping to milestone 2.3.9, copying in syzzer who might or might not have an idea (or time to get curious)

comment:5 Changed 4 years ago by Gert Döring

Milestone: release 2.3.9release 2.3.11

comment:6 Changed 4 years ago by Samuli Seppänen

Milestone: release 2.3.11release 2.3.12

comment:7 Changed 4 years ago by Gert Döring

Milestone: release 2.3.12release 2.3.15

slightly embarrassing, but not going to happen for 2.3.14

comment:8 Changed 7 months ago by tincantech

Cc: tincantech added
Milestone: release 2.3.15release 2.5
Note: See TracTickets for help on using tickets.