Opened 6 years ago

Last modified 16 months ago

#1061 new Feature Wish

Client cannot reconnect because of pushed routes

Reported by: ar4chn0 Owned by:
Priority: major Milestone: release 2.7
Component: Networking Version: OpenVPN git master branch (Community Ed)
Severity: Not set (select this one, unless your'e a OpenVPN developer) Keywords:
Cc:

Description

Hey, I have checked currently reported bugs and I am pretty sure this one wasn`t in there. If I have missed it - then I would like to apologize and you can remove this ticket by pointing me to the appropriate issue.

The issue is described in the following steps which can also be used to recreate it:

  1. Connect to VPN server
  2. Remove eth cable from the PC
  3. Wait till you get Inactivity timeout error
  4. Put the cable back in and wait for the client to reconnect (it cannot)
  5. Check routing table:
    0.0.0.0/1 via 10.7.7.1 dev tun0 
    default via 192.168.2.1 dev enp1s0 proto dhcp metric 20100 
    10.7.7.0/24 dev tun0 proto kernel scope link src 10.7.7.2 
    128.0.0.0/1 via 10.7.7.1 dev tun0 
    169.254.0.0/16 dev enp1s0 scope link metric 1000 
    192.168.2.0/24 dev enp1s0 proto kernel scope link src 192.168.2.51 metric 100 
    
  6. Remove these routes:
    0.0.0.0/1 via 10.7.7.1 dev tun0 
    10.7.7.0/24 dev tun0 proto kernel scope link src 10.7.7.2 
    128.0.0.0/1 via 10.7.7.1 dev tun0 
    
  7. Successfully reconnect

I guess you can already see where the issue is. The routes aren`t removed, and the client tries to initiate the connection via the tunnel which no longer exists. Same with TCP and UDP.

Log:

Thu May  3 14:21:25 2018 Initialization Sequence Completed
Thu May  3 14:26:58 2018 [185.245.86.157] Inactivity timeout (--ping-restart), restarting
Thu May  3 14:26:58 2018 SIGUSR1[soft,ping-restart] received, process restarting
Thu May  3 14:26:58 2018 Restart pause, 5 second(s)
Thu May  3 14:27:03 2018 TCP/UDP: Preserving recently used remote address: [AF_INET]185.245.86.157:443
Thu May  3 14:27:03 2018 Socket Buffers: R=[87380->425984] S=[16384->425984]
Thu May  3 14:27:03 2018 Attempting to establish TCP connection with [AF_INET]185.245.86.157:443 [nonblock]
Thu May  3 14:29:03 2018 TCP: connect to [AF_INET]185.245.86.157:443 failed: Connection timed out
Thu May  3 14:29:03 2018 SIGUSR1[connection failed(soft),init_instance] received, process restarting
Thu May  3 14:29:03 2018 Restart pause, 5 second(s)

Relevant client configs:

client
dev tun
proto tcp
resolv-retry infinite
remote-random
nobind
persist-key
persist-tun
reneg-sec 0
remote-cert-tls server
pull
fast-io

Client version: OpenVPN 2.4.4

Relevant server configs:

push "redirect-gateway def1"
topology subnet

Server version: OpenVPN 2.4.5


If you need further information from me - please let me know.

Change History (8)

comment:1 Changed 6 years ago by tct

CC -- investigating this I discovered something else but I would like to see how this is resolved first.

@ar4chn0 -- I could not replicate this problem myself, reconnecting worked for me but I am still looking into it. What OS is your client, I presume windows ?

Edit: No, obviously the client is Linux: dev enp1s0

Last edited 6 years ago by tct (previous) (diff)

comment:2 Changed 6 years ago by Gert Döring

The problem is that the OS seems to remove the route to the VPN server that we have installed (to be able to reach the server without going through the tunnel). Not much we can do inside OpenVPN here (except "monitor routing information changes", which is very different on each platform and possibly not worth the hassle).

If you remove persist-tun from your config, this should be sufficient to fully tear down the tunnel on reconnect so routes get fixed again.

comment:3 Changed 6 years ago by ar4chn0

This was tested on Debian Stretch.

Removing persist-tun does fix the issue indeed. Thanks guys, I guess this issue can be closed.

comment:4 Changed 6 years ago by Antonio Quartulli

I actually had a similar problem too.

In my case it happened when I was losing the uplink connection and the uplink interface was getting reconfigured (thus losing the route to the VPN server).

Maybe one way to fix this in a "generic manner" is to always check that there is a route to the VPN server passing through the main GW "if" redirect-gateway is specified?

comment:5 Changed 4 years ago by Gert Döring

There are a few possible ways to tackle it

  • listen to a netlink/route socket and be informed if our /32 or /128 host route goes away (due to interface flap, DHCPD or NM resetting all routes), and/or the default gateway changes, and re-install what is needed
  • do tricks with "ip rule" and fwmark to get "user traffic" into the VPN tunnel without having to change the "system routes" (= we do not care if the LAN interface changes, host routes go away, ...)
  • try binding to the interface (SO_BINDTODEVICE?) and make openvpn packets independent from the routing table. This has caveats with rp_filter eating reply packets (because if the route points to tun0, rp_filter=1 will drop such packets coming from eth0)

comment:6 Changed 4 years ago by Gert Döring

Milestone: release 2.6
Type: Bug / DefectFeature Wish
Version: OpenVPN git master branch (Community Ed)

We definitely must fix this for 2.6 - but it's more a "feature wish" type of ticket than a "bug report". It's just missing functionality to notice "uh, network interface has been reconfigured, we need to re-consider what our assumptions about routing and networks".

comment:7 Changed 16 months ago by oyudin

I approve, it's been a major flaw in OpenVPN for a long time.
It happens on any interface reconnection. Even if you have a weak Wi-Fi signal and it reconnects once in a while you permanently loose your Internet connection until you restart OpenVPN client manually.

Remove persist-tun

Removing persist-tun may be unacceptable:

  1. Your connections may break.
  2. Your real IP leaks while reconnecting

A workaround

I've managed to come up with a workaround. It doesn't solve the problem globally, but works for Linux (tested on the latest Arch)

user root
group root
persist-tun
up-restart
script-security 2
setenv VPNID your-instance-id
route-up "/bin/bash -c 'echo $trusted_ip > /tmp/ovpn_${VPNID}_ip && echo $route_net_gateway > /tmp/ovpn_${VPNID}_gw;'"
down "/bin/bash -c 'ip r a $(cat /tmp/ovpn_${VPNID}_ip) via $(cat /tmp/ovpn_${VPNID}_gw);:'"
#down "/bin/bash -c 'ip r a $(cat /tmp/ovpn_${VPNID}_ip) via $(cat /tmp/ovpn_${VPNID}_gw); ip r s | grep -E \"default|0\\.0\\.0\\.0\\/0\" | cut -d\" \" -f 3 | xargs -I{} ip r a $(cat /tmp/ovpn_${VPNID}_ip) via {};:'"

An explaination

  1. user root, group root - required to run ip route add command. You may use an unprivileged user but you will have to add sudo and create a proper /etc/sudoers.d/<...> file
  2. persist-tun - see above
  3. up-restart - trigers down script to run on reconnect. The up script won't run because of persist-tun
  4. script-security 2 - allow user-deined scripts
  5. setenv - may be necessary if you use multiple connections. Just a precaution
  6. route-up - dumps the values of $trusted_ip and $route_net_gateway at the initial successful connect. You could skip it and use down "/bin/bash -c 'ip r a $trusted_ip via $route_net_gateway;:'" but these variables are cleared on every reconnect, so it doesn't solve the problem
  7. down - runs on reconnect and restores the lost route with variable values saved by route-up. You may use the second one to dynamically find new network gateway and create static route through it. For me it allows to switch from home router to smartphone Wi-Fi tethering without having to restart OpenVPN client

Feature Wish

It would be nice if OpenVPN could do it on its own. All it needs to do is:

  1. Store $trusted_ip and $route_net_gateway variables internally. By the way $trusted_ip may change on reconnect if you have multiple remote <...> directives
  2. On reconnect try to re-create static route like it does on initial connect

comment:8 Changed 16 months ago by Gert Döring

Milestone: release 2.6release 2.7

The plan was to have a massive overhaul of the way OpenVPN interacts with the system routing layer for 2.6 - which did not happen, we went to DCO (kernel level data channel offload) instead.

We still want this, so bumping the milestone to 2.7

Note: See TracTickets for help on using tickets.