Opened 10 years ago

Closed 6 years ago

Last modified 4 years ago

#49 closed Feature Wish (fixed)

--float does not work with --server

Reported by: Samuli Seppänen Owned by:
Priority: major Milestone: release 2.4
Component: Networking Version: OpenVPN git master branch (Community Ed)
Severity: Not set (select this one, unless your'e a OpenVPN developer) Keywords:
Cc:

Description

I have an OpenVPN server in UDP TLS-server mode. Float is activated for every connection. When I change the ip of a connected client, I have to wait for the ping-timeout. All other packets are lost.

Attachments (3)

tlsfloat.patch (11.1 KB) - added by avalentin 7 years ago.
Enable floating in UDP server mode if auth algo is enabled
tlsfloat.2.patch (11.0 KB) - added by avalentin 7 years ago.
peer-id-v2.patch (17.8 KB) - added by stipa 6 years ago.

Download all attachments as: .zip

Change History (20)

comment:1 Changed 10 years ago by Samuli Seppänen

Reported to SF.net by Markus Wick (http://sourceforge.net/users/degasus/).

comment:2 Changed 10 years ago by Samuli Seppänen

Comment from Markus:

The problem here seems to be the call to link_socket_actual_match() in
tls_pre_decrypt() which prevents packets from the "wrong remotes" from
being seen by any later code, thus preventing --float from working with
TLS mode in general.

comment:3 Changed 8 years ago by Eric Crist

Version: 2.1.0 / 2.1.1git master branch

Ticket #206 was added, which is a duplicate of this one, with better debugging information. I'm including that ticket description here:


After the client gets a new IP, he's not able to communicate until ping-restart timeout when using tls-server/tls-client.
Scenario:

  • A server with a static IP
  • A client with a dynamic IP

Tested with 2.2.2
Log:
[...]
Wed May 2 15:16:22 2012 us=925612 [ford.[domain]] Peer Connection Initiated with 87.78.239.81:50101
Wed May 2 15:16:23 2012 us=18604 Initialization Sequence Completed
[redial, new IP address]
Wed May 2 15:16:37 2012 us=24975 TLS Error: local/remote TLS keys are out of sync: 87.78.237.54:50101 [0]
Wed May 2 15:16:38 2012 us=23629 TLS Error: local/remote TLS keys are out of sync: 87.78.237.54:50101 [0]
Wed May 2 15:16:39 2012 us=24576 TLS Error: local/remote TLS keys are out of sync: 87.78.237.54:50101 [0]
Wed May 2 15:16:40 2012 us=24436 TLS Error: local/remote TLS keys are out of sync: 87.78.237.54:50101 [0]
Wed May 2 15:16:41 2012 us=23111 TLS Error: local/remote TLS keys are out of sync: 87.78.237.54:50101 [0]
Wed May 2 15:16:42 2012 us=22823 TLS Error: local/remote TLS keys are out of sync: 87.78.237.54:50101 [0]
It seems that either

  • the check link_socket_actual_match (from, &ks->remote_addr) in ssl.c, function tls_pre_decrypt line 4633 must only be done when not using float

or

  • code for re-negotiating tls keys when a client changes it's IP is missing

Configs
server:
local [serverip]
lport 50001
ping 5
ping-restart 30
dev tun-ford
tun-ipv6
persist-tun
ifconfig 10.10.254.101 192.168.254.1
mlock
passtos
tun-ipv6
comp-lzo
float
tls-server
ca ca.crt
dh dh2048.pem
cert asterix.[domain].crt
key asterix.[domain].key
tls-auth tls-auth.key 0
tls-remote ford.[domain]
client:
remote [serverip] 50001
lport 50101
ping 5
ping-restart 30
dev tun-asterix
tun-ipv6
persist-tun
ifconfig 192.168.254.1 10.10.254.101
up /etc/openvpn/auto_asterix.up
script-security 2
mlock
passtos
comp-lzo
tls-client
ca ca.crt
dh dh2048.pem
cert ford.[domain].crt
key ford.[domain].key
tls-auth tls-auth.key 1
tls-remote asterix.[domain]

comment:4 Changed 8 years ago by siemer

Almost four years old...

@ecrist: so just kicking link_socket_actual_match() out and done?

The proposal sounds like a really easy patch to me!

comment:5 Changed 8 years ago by siemer

I shot the following with --verb 11. After I changed the IP of the client the server spits out this:

Sun May 27 04:09:36 2012 us=605089 event_wait returned 1
Sun May 27 04:09:36 2012 us=605134 I/O WAIT status=0x0001
Sun May 27 04:09:36 2012 us=605177 MULTI: REAP range 144 -> 160
Sun May 27 04:09:36 2012 us=605236 UDPv4 read returned 125
Sun May 27 04:09:36 2012 us=605290 TLS State Error: No TLS state for client [AF_INET]192.168.144.30:37202, opcode=6
Sun May 27 04:09:36 2012 us=605345 GET INST BY REAL: 192.168.144.30:37202 [failed]
Sun May 27 04:09:36 2012 us=605393 PO_CTL rwflags=0x0001 ev=4 arg=0x080c764c
Sun May 27 04:09:36 2012 us=605436 PO_CTL rwflags=0x0001 ev=5 arg=0x080c75a8
Sun May 27 04:09:36 2012 us=605487 I/O WAIT TR|Tw|SR|Sw [10/0]
Sun May 27 04:09:37 2012 us=606968 PO_WAIT[0,0] fd=4 rev=0x00000001 rwflags=0x0001 arg=0x080c764c
Sun May 27 04:09:37 2012 us=607113 event_wait returned 1
Sun May 27 04:09:37 2012 us=607164 I/O WAIT status=0x0001
Sun May 27 04:09:37 2012 us=607211 MULTI: REAP range 160 -> 176
Sun May 27 04:09:37 2012 us=607277 UDPv4 read returned 125
Sun May 27 04:09:37 2012 us=607339 TLS State Error: No TLS state for client [AF_INET]192.168.144.30:37202, opcode=6
Sun May 27 04:09:37 2012 us=607399 GET INST BY REAL: 192.168.144.30:37202 [failed]
Sun May 27 04:09:37 2012 us=607452 PO_CTL rwflags=0x0001 ev=4 arg=0x080c764c
Sun May 27 04:09:37 2012 us=607497 PO_CTL rwflags=0x0001 ev=5 arg=0x080c75a8
Sun May 27 04:09:37 2012 us=607554 I/O WAIT TR|Tw|SR|Sw [9/207763]
Sun May 27 04:09:38 2012 us=604851 PO_WAIT[0,0] fd=4 rev=0x00000001 rwflags=0x0001 arg=0x080c764c

But my openvpn is older: version 2.1.3, but the TLS subsystem doesn’t complain about keys out of sync, but TLS State Error: No TLS state for client.

comment:6 in reply to:  5 Changed 8 years ago by marceldb

I've been looking into the same issue a bit and I think I have some more debugging:

The 'No TLS state for client' is an extra check that was probably added later in tls_pre_decrypt_lite. When I kick out that check, I get the 'TLS keys are out of sync' error again. Just above that error, there is a defined-out debugging block that gives more information about what the actual values are to the test in tls_pre_decrypt mentioned in ticket #206:

Thu Nov 15 21:06:05 2012 us=663890 213.119.171.47:30777 UDPv4 READ [125] from [AF_INET]213.119.171.47:30777: P_DATA_V1 kid=0 DATA len=124
Thu Nov 15 21:06:05 2012 us=663979 213.119.171.47:30777 TLS_PRE_DECRYPT: [0] dken=0 rkid=0 lkid=0 auth=0 def=0 match=0
Thu Nov 15 21:06:05 2012 us=664057 213.119.171.47:30777 TLS_PRE_DECRYPT: [1] dken=0 rkid=0 lkid=0 auth=0 def=0 match=0
Thu Nov 15 21:06:05 2012 us=664144 213.119.171.47:30777 TLS_PRE_DECRYPT: [2] dken=0 rkid=0 lkid=0 auth=0 def=0 match=0
Thu Nov 15 21:06:05 2012 us=664296 213.119.171.47:30777 TLS Error: local/remote TLS keys are out of sync: [AF_INET]213.119.171.47:30777 [0]

This seems to indicate that it's not just link_socket_actual_match (match) that's failing, but DECRYPT_KEY_ENABLED (dken) and ks->authenticated (auth) are also preventing the packet from getting accepted.

(This is the code that ticket #206 references?)

if (DECRYPT_KEY_ENABLED (multi, ks)

&& key_id == ks->key_id
&& ks->authenticated

#ifdef ENABLE_DEF_AUTH

&& !ks->auth_deferred

#endif

&& link_socket_actual_match (from, &ks->remote_addr))

{

/* return appropriate data channel decrypt key in opt */

The issue is now that I don't have enough knowledge about crypto programming to dig much deeper and especially not to make any useful changes, but I hope this helps a bit.

My application is a 3G mobile connected laptop with an occasionally very flaky link that I tunnel through a static machine so that the tunnel server provides a static endpoint for TCP sessions, even if the 3G connection is dropped and comes back up with a different IP. Being able to use --float would make it quicker to resume sessions in that case and not have the delay that comes with sending the USR1 signal to openvpn; in the train I often only have very short timeframes of connectivity that I'd like to use as much as possible.

comment:7 Changed 7 years ago by avalentin

Hi!

I also saw this problem and thought about a soultion. I found the HMAC in auth mode, which is enabled by default and gives a way to check a floated peer against the list of existing peers. If the HMAC fits the peers crypto context, the old peer is updated with the new address.
Please see the following patch as a draft!

Last edited 7 years ago by avalentin (previous) (diff)

Changed 7 years ago by avalentin

Attachment: tlsfloat.patch added

Enable floating in UDP server mode if auth algo is enabled

comment:8 Changed 7 years ago by marceldb

Thanks for the patch! I've been trying it for a few days now and it seems to be working (connectivity is much better.)

I did have to disable the ping checks though, otherwise the tunnel will still be renegotiated during a connection outage. Unfortunately, disabling the ping checks results in a crash of the server, apparently when the connection is torn down because of too many failed TLS renegotiations:

Fri Oct 25 18:06:12 2013 xxx.xxx.xxx.xxx:22494 Authenticate/Decrypt? packet error: packet HMAC authentication failed

Repeated quite a few times, this is probably some leftover traffic as there were TLS handshake failures just above it too. (18:06 is around the time when I drop the connection from the client.)

One hour later it appears to start TLS renegotiation, which fails because the client is no longer there:

Fri Oct 25 19:07:13 2013 xxx.xxx.xxx.xxx:22494 TLS Error: TLS key negotiation failed to occur within 60 seconds (check your network connectivity)
Fri Oct 25 19:07:13 2013 xxx.xxx.xxx.xxx:22494 TLS Error: TLS handshake failed
<... repeated many times ...>
Fri Oct 25 20:07:13 2013 xxx.xxx.xxx.xxx:22494 TLS Error: TLS key negotiation failed to occur within 60 seconds (check your network connectivity)
Fri Oct 25 20:07:13 2013 xxx.xxx.xxx.xxx:22494 TLS Error: TLS handshake failed

... and after an hour of trying, the server gives up completely:

Fri Oct 25 20:07:13 2013 Assertion failed at multi.c:546
Fri Oct 25 20:07:13 2013 Exiting due to fatal error

This is the failing code:

  if (m->earliest_wakeup == mi)
    m->earliest_wakeup = NULL;

  if (!shutdown)
    {
      if (mi->did_real_hash)
        {
          ASSERT (hash_remove (m->hash, &mi->real));
        }
      if (mi->did_iter)
        {
          ASSERT (hash_remove (m->iter, &mi->real));     /* 546 */
        }

This is probably just because I completely disabled the ping checks (instead of setting them to a much longer interval and tearing down the connection after 30 minutes or so), but I thought I'd share it anyway.

Changed 7 years ago by avalentin

Attachment: tlsfloat.2.patch added

comment:9 Changed 7 years ago by avalentin

Hi,

I uploaded a fixed version. I also noted these problems. You have to patch master to get it working. Please retry.
Here an example for a fitting config:

tls-timeout 10
reneg-sec 0
keepalive 0 0
ping-restart 0
ping 300

Same reason, server should keep the instance as long as possible. But my client has:

ping 60
ping-restart 3600
reneg-sec 7200

André

Last edited 7 years ago by avalentin (previous) (diff)

comment:10 Changed 7 years ago by marceldb

That patch keeps the server up for much longer, thanks! It hasn't crashed yet, so far.
(Sorry for the late response, didn't have many chances to try it in usual conditions over the holidays.)

Last edited 7 years ago by marceldb (previous) (diff)

comment:11 Changed 7 years ago by Gert Döring

Milestone: release 2.4
Type: Bug / DefectFeature Wish

This has been discussed among the developers, and while the patch from avalentin is great as a quick fix (thanks), it has a certain risk for servers with many concurrent clients, as the server would have to walk the (long) client list for each "unknown" packet, computing a HMAC for each - so it's a DoS vector, and this led to James veto'ing this approach.

2.4 is very likely to have a new approach to this, using a new packet format which can have a session ID in the data packet, making the match client<->packet possible without crypto operations.

Changed to "feature wish" to better reflect that this is not "existing functionality not working" but "new functionality".

comment:12 Changed 6 years ago by teco

Is somebody working on an enhancement? With increased number of wireless access, NAT etc floating is a useful feature. It would be nice if it works with TLS connections.

comment:13 in reply to:  12 Changed 6 years ago by avalentin

Yes, there's some work going on. In the meantime you may use my modifications:
https://github.com/avalentin/openvpn/tree/tlsfloat

comment:14 Changed 6 years ago by teco

Thanks avalentin.
I'm looking forward to have it in new release. After that, I have to wait until the downstream OpenVPN-NL derivate.
Meanwhile, I'll use your modifications on servers only.

Changed 6 years ago by stipa

Attachment: peer-id-v2.patch added

comment:15 Changed 6 years ago by stipa

Here goes peer-id patch that has been discussed on IRC meeting.

http://article.gmane.org/gmane.network.openvpn.devel/9214

Added new packet format P_DATA_V2, which includes peer-id. If server
supports, client sends all data packets in the new format. When data
packet arrives, server identifies peer by peer-id. If peer's ip/port has
changed, server assumes that client has floated, verifies HMAC and
updates ip/port in internal structs.

Last edited 6 years ago by stipa (previous) (diff)

comment:16 Changed 6 years ago by Gert Döring

Resolution: fixed
Status: newclosed

Full peer-id support (client and server) has been merged to git master:

commit 65eedc353349d2967fc03c54da807727e416e1b0
Author: Lev Stipakov <lstipakov@…>
Date: Sun Nov 23 17:17:11 2014 +0200

Peer-id patch v7


Added new packet format P_DATA_V2, which includes peer-id. If server
supports, client sends all data packets in the new format. When data
packet arrives, server identifies peer by peer-id. If peer's ip/port has
changed, server assumes that client has floated, verifies HMAC and
updates ip/port in internal structs.

and the client-side of this has been added to release/2.3, included in 2.3.6 already:

commit 0e1fd33247460bdfa65d306e8bcdd3cbafed8b73
Author: Gert Doering <gert@…>
Date: Sun Nov 23 20:17:30 2014 +0100

Add client-only support for peer-id.


This is a reduced version of the peer-id patch from Lev Stipakov
implementing only the client side bits - send IV_PROTO=2, accept
"peer-id <n>" as pushed option, support P_DATA_V2 packets.

So, using a 2.3.6 or newer client and git master server, you can have the benefits of "tls-float" without the associated risks - the client is identified by its peer-id, so if the client address changes but peer-id + HMAC verify, the client session will float to the new client IP address.

Thanks for your patience.

comment:17 Changed 4 years ago by teco

Now I have tested for over a year. I switched clients to 2.3.9 recently (OpenVPN-NL).

Remarks:
1) troubleshooting with changing connection info is hard, the peer is numbered but during session setup this peer ID is not provided. e.g.:
Jun 3 18:17:38 server ovpn-openvpn-nl-tun0[678]: Untrusted peer 27 wants to float to 10.2.2.2:64780

More info on peer 27 is unknown to me. Before, the common-name and IP address_port information is used. Maybe add old address_port in this message?
-- is solved already, I use older version...

2) I see some problems when floating, something with key negotiation.
Jun 3 17:51:32 server ovpn-openvpn-nl-tun0[678]: 10.128.0.33:35780 WARNING: normally if you use --mssfix and/or --fragment, you should also set --tun-mtu 1500 (currently it is 1350)
Jun 3 17:52:32 server ovpn-openvpn-nl-tun0[678]: 10.128.0.33:35780 TLS Error: TLS key negotiation failed to occur within 60 seconds (check your network connectivity)
Jun 3 17:52:32 server ovpn-openvpn-nl-tun0[678]: 10.128.0.33:35780 TLS Error: TLS handshake failed

The first message is OK for me, I have to deal with tunnels and fragmentation issues a lot.
The other two are strange. I cannot relate port 35780 to earlier messages. Maybe floating happened at same time as float. Unlikely, but it can happen. I have reneg-sec 86400 so this is unlikely also.

I'll make new tickets after more testing.

It would be nice if this floating for hub&spoke feature was released. Is it possible to make server part available in 2.3? Planning for a 2.4?

Last edited 4 years ago by teco (previous) (diff)
Note: See TracTickets for help on using tickets.