#49 closed Feature Wish (fixed)
--float does not work with --server
Reported by: | Samuli Seppänen | Owned by: | |
---|---|---|---|
Priority: | major | Milestone: | release 2.4 |
Component: | Networking | Version: | OpenVPN git master branch (Community Ed) |
Severity: | Not set (select this one, unless your'e a OpenVPN developer) | Keywords: | |
Cc: |
Description
I have an OpenVPN server in UDP TLS-server mode. Float is activated for every connection. When I change the ip of a connected client, I have to wait for the ping-timeout. All other packets are lost.
Attachments (3)
Change History (20)
comment:1 Changed 14 years ago by
comment:2 Changed 14 years ago by
Comment from Markus:
The problem here seems to be the call to link_socket_actual_match() in tls_pre_decrypt() which prevents packets from the "wrong remotes" from being seen by any later code, thus preventing --float from working with TLS mode in general.
comment:3 Changed 12 years ago by
Version: | 2.1.0 / 2.1.1 → git master branch |
---|
Ticket #206 was added, which is a duplicate of this one, with better debugging information. I'm including that ticket description here:
After the client gets a new IP, he's not able to communicate until ping-restart timeout when using tls-server/tls-client.
Scenario:
- A server with a static IP
- A client with a dynamic IP
Tested with 2.2.2
Log:
[...]
Wed May 2 15:16:22 2012 us=925612 [ford.[domain]] Peer Connection Initiated with 87.78.239.81:50101
Wed May 2 15:16:23 2012 us=18604 Initialization Sequence Completed
[redial, new IP address]
Wed May 2 15:16:37 2012 us=24975 TLS Error: local/remote TLS keys are out of sync: 87.78.237.54:50101 [0]
Wed May 2 15:16:38 2012 us=23629 TLS Error: local/remote TLS keys are out of sync: 87.78.237.54:50101 [0]
Wed May 2 15:16:39 2012 us=24576 TLS Error: local/remote TLS keys are out of sync: 87.78.237.54:50101 [0]
Wed May 2 15:16:40 2012 us=24436 TLS Error: local/remote TLS keys are out of sync: 87.78.237.54:50101 [0]
Wed May 2 15:16:41 2012 us=23111 TLS Error: local/remote TLS keys are out of sync: 87.78.237.54:50101 [0]
Wed May 2 15:16:42 2012 us=22823 TLS Error: local/remote TLS keys are out of sync: 87.78.237.54:50101 [0]
It seems that either
- the check link_socket_actual_match (from, &ks->remote_addr) in ssl.c, function tls_pre_decrypt line 4633 must only be done when not using float
or
- code for re-negotiating tls keys when a client changes it's IP is missing
Configs
server:
local [serverip]
lport 50001
ping 5
ping-restart 30
dev tun-ford
tun-ipv6
persist-tun
ifconfig 10.10.254.101 192.168.254.1
mlock
passtos
tun-ipv6
comp-lzo
float
tls-server
ca ca.crt
dh dh2048.pem
cert asterix.[domain].crt
key asterix.[domain].key
tls-auth tls-auth.key 0
tls-remote ford.[domain]
client:
remote [serverip] 50001
lport 50101
ping 5
ping-restart 30
dev tun-asterix
tun-ipv6
persist-tun
ifconfig 192.168.254.1 10.10.254.101
up /etc/openvpn/auto_asterix.up
script-security 2
mlock
passtos
comp-lzo
tls-client
ca ca.crt
dh dh2048.pem
cert ford.[domain].crt
key ford.[domain].key
tls-auth tls-auth.key 1
tls-remote asterix.[domain]
comment:4 Changed 12 years ago by
Almost four years old...
@ecrist: so just kicking link_socket_actual_match() out and done?
The proposal sounds like a really easy patch to me!
comment:5 follow-up: 6 Changed 12 years ago by
I shot the following with --verb 11. After I changed the IP of the client the server spits out this:
Sun May 27 04:09:36 2012 us=605089 event_wait returned 1
Sun May 27 04:09:36 2012 us=605134 I/O WAIT status=0x0001
Sun May 27 04:09:36 2012 us=605177 MULTI: REAP range 144 -> 160
Sun May 27 04:09:36 2012 us=605236 UDPv4 read returned 125
Sun May 27 04:09:36 2012 us=605290 TLS State Error: No TLS state for client [AF_INET]192.168.144.30:37202, opcode=6
Sun May 27 04:09:36 2012 us=605345 GET INST BY REAL: 192.168.144.30:37202 [failed]
Sun May 27 04:09:36 2012 us=605393 PO_CTL rwflags=0x0001 ev=4 arg=0x080c764c
Sun May 27 04:09:36 2012 us=605436 PO_CTL rwflags=0x0001 ev=5 arg=0x080c75a8
Sun May 27 04:09:36 2012 us=605487 I/O WAIT TR|Tw|SR|Sw [10/0]
Sun May 27 04:09:37 2012 us=606968 PO_WAIT[0,0] fd=4 rev=0x00000001 rwflags=0x0001 arg=0x080c764c
Sun May 27 04:09:37 2012 us=607113 event_wait returned 1
Sun May 27 04:09:37 2012 us=607164 I/O WAIT status=0x0001
Sun May 27 04:09:37 2012 us=607211 MULTI: REAP range 160 -> 176
Sun May 27 04:09:37 2012 us=607277 UDPv4 read returned 125
Sun May 27 04:09:37 2012 us=607339 TLS State Error: No TLS state for client [AF_INET]192.168.144.30:37202, opcode=6
Sun May 27 04:09:37 2012 us=607399 GET INST BY REAL: 192.168.144.30:37202 [failed]
Sun May 27 04:09:37 2012 us=607452 PO_CTL rwflags=0x0001 ev=4 arg=0x080c764c
Sun May 27 04:09:37 2012 us=607497 PO_CTL rwflags=0x0001 ev=5 arg=0x080c75a8
Sun May 27 04:09:37 2012 us=607554 I/O WAIT TR|Tw|SR|Sw [9/207763]
Sun May 27 04:09:38 2012 us=604851 PO_WAIT[0,0] fd=4 rev=0x00000001 rwflags=0x0001 arg=0x080c764c
But my openvpn is older: version 2.1.3, but the TLS subsystem doesn’t complain about keys out of sync, but TLS State Error: No TLS state for client.
comment:6 Changed 12 years ago by
I've been looking into the same issue a bit and I think I have some more debugging:
The 'No TLS state for client' is an extra check that was probably added later in tls_pre_decrypt_lite. When I kick out that check, I get the 'TLS keys are out of sync' error again. Just above that error, there is a defined-out debugging block that gives more information about what the actual values are to the test in tls_pre_decrypt mentioned in ticket #206:
Thu Nov 15 21:06:05 2012 us=663890 213.119.171.47:30777 UDPv4 READ [125] from [AF_INET]213.119.171.47:30777: P_DATA_V1 kid=0 DATA len=124
Thu Nov 15 21:06:05 2012 us=663979 213.119.171.47:30777 TLS_PRE_DECRYPT: [0] dken=0 rkid=0 lkid=0 auth=0 def=0 match=0
Thu Nov 15 21:06:05 2012 us=664057 213.119.171.47:30777 TLS_PRE_DECRYPT: [1] dken=0 rkid=0 lkid=0 auth=0 def=0 match=0
Thu Nov 15 21:06:05 2012 us=664144 213.119.171.47:30777 TLS_PRE_DECRYPT: [2] dken=0 rkid=0 lkid=0 auth=0 def=0 match=0
Thu Nov 15 21:06:05 2012 us=664296 213.119.171.47:30777 TLS Error: local/remote TLS keys are out of sync: [AF_INET]213.119.171.47:30777 [0]
This seems to indicate that it's not just link_socket_actual_match (match) that's failing, but DECRYPT_KEY_ENABLED (dken) and ks->authenticated (auth) are also preventing the packet from getting accepted.
(This is the code that ticket #206 references?)
if (DECRYPT_KEY_ENABLED (multi, ks)
&& key_id == ks->key_id
&& ks->authenticated
#ifdef ENABLE_DEF_AUTH
&& !ks->auth_deferred
#endif
&& link_socket_actual_match (from, &ks->remote_addr))
{
/* return appropriate data channel decrypt key in opt */
The issue is now that I don't have enough knowledge about crypto programming to dig much deeper and especially not to make any useful changes, but I hope this helps a bit.
My application is a 3G mobile connected laptop with an occasionally very flaky link that I tunnel through a static machine so that the tunnel server provides a static endpoint for TCP sessions, even if the 3G connection is dropped and comes back up with a different IP. Being able to use --float would make it quicker to resume sessions in that case and not have the delay that comes with sending the USR1 signal to openvpn; in the train I often only have very short timeframes of connectivity that I'd like to use as much as possible.
comment:7 Changed 11 years ago by
Hi!
I also saw this problem and thought about a soultion. I found the HMAC in auth mode, which is enabled by default and gives a way to check a floated peer against the list of existing peers. If the HMAC fits the peers crypto context, the old peer is updated with the new address.
Please see the following patch as a draft!
Changed 11 years ago by
Attachment: | tlsfloat.patch added |
---|
Enable floating in UDP server mode if auth algo is enabled
comment:8 Changed 11 years ago by
Thanks for the patch! I've been trying it for a few days now and it seems to be working (connectivity is much better.)
I did have to disable the ping checks though, otherwise the tunnel will still be renegotiated during a connection outage. Unfortunately, disabling the ping checks results in a crash of the server, apparently when the connection is torn down because of too many failed TLS renegotiations:
Fri Oct 25 18:06:12 2013 xxx.xxx.xxx.xxx:22494 Authenticate/Decrypt? packet error: packet HMAC authentication failed
Repeated quite a few times, this is probably some leftover traffic as there were TLS handshake failures just above it too. (18:06 is around the time when I drop the connection from the client.)
One hour later it appears to start TLS renegotiation, which fails because the client is no longer there:
Fri Oct 25 19:07:13 2013 xxx.xxx.xxx.xxx:22494 TLS Error: TLS key negotiation failed to occur within 60 seconds (check your network connectivity)
Fri Oct 25 19:07:13 2013 xxx.xxx.xxx.xxx:22494 TLS Error: TLS handshake failed
<... repeated many times ...>
Fri Oct 25 20:07:13 2013 xxx.xxx.xxx.xxx:22494 TLS Error: TLS key negotiation failed to occur within 60 seconds (check your network connectivity)
Fri Oct 25 20:07:13 2013 xxx.xxx.xxx.xxx:22494 TLS Error: TLS handshake failed
... and after an hour of trying, the server gives up completely:
Fri Oct 25 20:07:13 2013 Assertion failed at multi.c:546
Fri Oct 25 20:07:13 2013 Exiting due to fatal error
This is the failing code:
if (m->earliest_wakeup == mi) m->earliest_wakeup = NULL; if (!shutdown) { if (mi->did_real_hash) { ASSERT (hash_remove (m->hash, &mi->real)); } if (mi->did_iter) { ASSERT (hash_remove (m->iter, &mi->real)); /* 546 */ }
This is probably just because I completely disabled the ping checks (instead of setting them to a much longer interval and tearing down the connection after 30 minutes or so), but I thought I'd share it anyway.
Changed 11 years ago by
Attachment: | tlsfloat.2.patch added |
---|
comment:9 Changed 11 years ago by
Hi,
I uploaded a fixed version. I also noted these problems. You have to patch master to get it working. Please retry.
Here an example for a fitting config:
tls-timeout 10 reneg-sec 0 keepalive 0 0 ping-restart 0 ping 300
Same reason, server should keep the instance as long as possible. But my client has:
ping 60 ping-restart 3600 reneg-sec 7200
André
comment:10 Changed 11 years ago by
That patch keeps the server up for much longer, thanks! It hasn't crashed yet, so far.
(Sorry for the late response, didn't have many chances to try it in usual conditions over the holidays.)
comment:11 Changed 11 years ago by
Milestone: | → release 2.4 |
---|---|
Type: | Bug / Defect → Feature Wish |
This has been discussed among the developers, and while the patch from avalentin is great as a quick fix (thanks), it has a certain risk for servers with many concurrent clients, as the server would have to walk the (long) client list for each "unknown" packet, computing a HMAC for each - so it's a DoS vector, and this led to James veto'ing this approach.
2.4 is very likely to have a new approach to this, using a new packet format which can have a session ID in the data packet, making the match client<->packet possible without crypto operations.
Changed to "feature wish" to better reflect that this is not "existing functionality not working" but "new functionality".
comment:12 follow-up: 13 Changed 10 years ago by
Is somebody working on an enhancement? With increased number of wireless access, NAT etc floating is a useful feature. It would be nice if it works with TLS connections.
comment:13 Changed 10 years ago by
Yes, there's some work going on. In the meantime you may use my modifications:
https://github.com/avalentin/openvpn/tree/tlsfloat
comment:14 Changed 10 years ago by
Thanks avalentin.
I'm looking forward to have it in new release. After that, I have to wait until the downstream OpenVPN-NL derivate.
Meanwhile, I'll use your modifications on servers only.
Changed 10 years ago by
Attachment: | peer-id-v2.patch added |
---|
comment:15 Changed 10 years ago by
Here goes peer-id patch that has been discussed on IRC meeting.
http://article.gmane.org/gmane.network.openvpn.devel/9214
Added new packet format P_DATA_V2, which includes peer-id. If server
supports, client sends all data packets in the new format. When data
packet arrives, server identifies peer by peer-id. If peer's ip/port has
changed, server assumes that client has floated, verifies HMAC and
updates ip/port in internal structs.
comment:16 Changed 10 years ago by
Resolution: | → fixed |
---|---|
Status: | new → closed |
Full peer-id support (client and server) has been merged to git master:
commit 65eedc353349d2967fc03c54da807727e416e1b0
Author: Lev Stipakov <lstipakov@…>
Date: Sun Nov 23 17:17:11 2014 +0200
Peer-id patch v7
Added new packet format P_DATA_V2, which includes peer-id. If server
supports, client sends all data packets in the new format. When data
packet arrives, server identifies peer by peer-id. If peer's ip/port has
changed, server assumes that client has floated, verifies HMAC and
updates ip/port in internal structs.
and the client-side of this has been added to release/2.3, included in 2.3.6 already:
commit 0e1fd33247460bdfa65d306e8bcdd3cbafed8b73
Author: Gert Doering <gert@…>
Date: Sun Nov 23 20:17:30 2014 +0100
Add client-only support for peer-id.
This is a reduced version of the peer-id patch from Lev Stipakov
implementing only the client side bits - send IV_PROTO=2, accept
"peer-id <n>" as pushed option, support P_DATA_V2 packets.
So, using a 2.3.6 or newer client and git master server, you can have the benefits of "tls-float" without the associated risks - the client is identified by its peer-id, so if the client address changes but peer-id + HMAC verify, the client session will float to the new client IP address.
Thanks for your patience.
comment:17 Changed 8 years ago by
Now I have tested for over a year. I switched clients to 2.3.9 recently (OpenVPN-NL).
Remarks:
1) troubleshooting with changing connection info is hard, the peer is numbered but during session setup this peer ID is not provided. e.g.:
Jun 3 18:17:38 server ovpn-openvpn-nl-tun0[678]: Untrusted peer 27 wants to float to 10.2.2.2:64780
More info on peer 27 is unknown to me. Before, the common-name and IP address_port information is used. Maybe add old address_port in this message?
-- is solved already, I use older version...
2) I see some problems when floating, something with key negotiation.
Jun 3 17:51:32 server ovpn-openvpn-nl-tun0[678]: 10.128.0.33:35780 WARNING: normally if you use --mssfix and/or --fragment, you should also set --tun-mtu 1500 (currently it is 1350)
Jun 3 17:52:32 server ovpn-openvpn-nl-tun0[678]: 10.128.0.33:35780 TLS Error: TLS key negotiation failed to occur within 60 seconds (check your network connectivity)
Jun 3 17:52:32 server ovpn-openvpn-nl-tun0[678]: 10.128.0.33:35780 TLS Error: TLS handshake failed
The first message is OK for me, I have to deal with tunnels and fragmentation issues a lot.
The other two are strange. I cannot relate port 35780 to earlier messages. Maybe floating happened at same time as float. Unlikely, but it can happen. I have reneg-sec 86400 so this is unlikely also.
I'll make new tickets after more testing.
It would be nice if this floating for hub&spoke feature was released. Is it possible to make server part available in 2.3? Planning for a 2.4?
Reported to SF.net by Markus Wick (http://sourceforge.net/users/degasus/).