Opened 3 years ago

Last modified 3 years ago

#778 reopened Bug / Defect

Routes changed by --redirect gatway not re-instated on exit if interactive service is in use

Reported by: selvanair Owned by:
Priority: blocker Milestone: release 2.4
Component: Generic / unclassified Version: OpenVPN 2.4_alpha2 (Community Ed)
Severity: Not set (select this one, unless your'e a OpenVPN developer) Keywords:
Cc:

Description

Windows 7, 2.4_beta2, with --redirect-gateway in config, started using GUI with interactive service:

The IPv4 default route gets redirected to the tunnel as expected, but on exit the default route is gone and the machine loses connectivity. Strangely SIGHUP restart does not fail as it would be expected if the default route disappears. I never noticed this before may be because I always use --redirect-gateway def1

Logs show all the right bits are there (log snippet copied below) but the routing table shows (after openvpn exit)

IPv4 Route Table
===========================================================================
Active Routes:
Network Destination        Netmask          Gateway       Interface  Metric
        127.0.0.0        255.0.0.0         On-link         127.0.0.1    306
        127.0.0.1  255.255.255.255         On-link         127.0.0.1    306
  127.255.255.255  255.255.255.255         On-link         127.0.0.1    306
      192.168.0.0    255.255.255.0         On-link     192.168.0.110    281
    192.168.0.110  255.255.255.255         On-link     192.168.0.110    281
    192.168.0.255  255.255.255.255         On-link     192.168.0.110    281
        224.0.0.0        240.0.0.0         On-link         127.0.0.1    306
        224.0.0.0        240.0.0.0         On-link     192.168.0.110    281
  255.255.255.255  255.255.255.255         On-link         127.0.0.1    306
  255.255.255.255  255.255.255.255         On-link     192.168.0.110    281

Logs

Mon Nov 28 22:11:25 2016 us=80441 TEST ROUTES: 1/1 succeeded len=0 ret=1 a=0 u/d=up
Mon Nov 28 22:11:25 2016 us=80441 C:\Windows\system32\route.exe ADD 192.155.xx.xxx MASK 255.255.255.255 192.168.0.1
Mon Nov 28 22:11:25 2016 us=87441 Route addition via service succeeded
Mon Nov 28 22:11:25 2016 us=87441 C:\Windows\system32\route.exe DELETE 0.0.0.0 MASK 0.0.0.0 192.168.0.1
Mon Nov 28 22:11:25 2016 us=94442 Route deletion via service succeeded
Mon Nov 28 22:11:25 2016 us=95442 C:\Windows\system32\route.exe ADD 0.0.0.0 MASK 0.0.0.0 10.9.0.1
Mon Nov 28 22:11:25 2016 us=103442 Route addition via service succeeded
Mon Nov 28 22:11:25 2016 us=103442 Initialization Sequence Completed
..
..
Mon Nov 28 22:13:06 2016 us=517243 SIGTERM received, sending exit notification to peer
Mon Nov 28 22:13:09 2016 us=227398 TCP/UDP: Closing socket
Mon Nov 28 22:13:09 2016 us=228398 C:\Windows\system32\route.exe DELETE 192.155.xx.xxx MASK 255.255.255.255 192.168.0.1
Mon Nov 28 22:13:09 2016 us=236398 Route deletion via service succeeded
Mon Nov 28 22:13:09 2016 us=236398 C:\Windows\system32\route.exe DELETE 0.0.0.0 MASK 0.0.0.0 10.9.0.1
Mon Nov 28 22:13:09 2016 us=243399 Route deletion via service succeeded
Mon Nov 28 22:13:09 2016 us=243399 C:\Windows\system32\route.exe ADD 0.0.0.0 MASK 0.0.0.0 192.168.0.1
Mon Nov 28 22:13:09 2016 us=252399 Route addition via service succeeded
Mon Nov 28 22:13:09 2016 us=252399 Closing TUN/TAP interface

Change History (10)

comment:1 Changed 3 years ago by Gert Döring

Milestone: release 2.4
Priority: majorblocker

I thought I had already commented on this, but maybe clicked wrongly.

Anyway.

Looking at HandleRouteMessage?(), this all looks reasonable - we send the route towards the indexed interface

  if (msg->iface.index != -1)
    {
      fwd_row->InterfaceIndex = msg->iface.index;
    }

and the caller in openvpn's route.c does

do_route_ipv4_service (const bool add, const struct route_ipv4 *r, const struct tuntap *tt)
{
  DWORD if_index = windows_route_find_if_index (r, tt);
... 
   .iface = { .index = if_index, .name = "" },
}

and windows_route_find_if_index() *should* do the right thing (it walks interfaces and looks for a match).

Maybe CreateIpForwardEntry2() is interfering with DHCP (on the public interface) here? Just guessing...

But this is a serious isse that we need to fix before 2.4.0 release - it will break people's configuration.

If we can't find the real reason, maybe resort to "on windows, if the iservice is used, there is only def1 and no remove/add real gateway"?

comment:2 Changed 3 years ago by selvanair

Haven't got time to test this, but just woke up wit this guess:) undo list in service keeps a list of all routes added and are removed when the thread servicing openvpn process exits.

comment:3 Changed 3 years ago by Gert Döring

This sound fairly likely indeed :-)

This is actually interesting to think through, and a bit mind-bending... two "undo" mechanisms fighting. The original route deletion doesn't end up on the undo list, and trying to undo that from openvpn gets un-done by the service...

*scratch head*

(We can't just let openvpn install routes that have an "do not undo this!" flag...)

comment:4 Changed 3 years ago by selvanair

If we can't find the real reason, maybe resort to "on windows, if the iservice is used, there is only def1 and no remove/add real gateway"?

Even if we find the real reason (which I'm more or less convinced is due to undo -- still not tested), why not make --redirect-gateway always imply def1. Isn't it better to always leave the default route as is and add the 0.0.0.0/1 and 128.0.0.0/1 routes?

comment:5 Changed 3 years ago by Gert Döring

We discussed this briefly on IRC, and plaisthos' comment was "Android always does def1"... so I think this is the cleanest way forward: OpenVPN will never *remove* routes that someone else has installed, but will only add new stuff - which gets cleaned up by the service (in case openvpn crashes).

As a side note: the IPv6 code only ever does the IPv6-equivalent of "def1", because it is just too complicated to remove an IPv6 default route across all platforms and ensure it's not re-added a few seconds later by whatever part processes RA/DHCPv6 - so, there is a precedent...

If we can do this cleanly, I'd tend to "def1 is used implicitly *if* using the service pipe" - so "non-service behaviour" isn't changed, and "service behaviour" is new anyway, so we do not have to care about "it used to be <xxx>" at all.

... somewhere in the options postprocess mutate jungle... maybe with a informational msg() what and why was implied.

comment:6 Changed 3 years ago by selvanair

That is a simple fix then:

In options_postprocess_mutate or options_postprocess_verify_ce add RG_DEF1 if redirecting and msg_channel is defined.

Also do the same in option parsing itself to handle pushed --redirect-gateway.

Hope that covers all cases.

comment:7 Changed 3 years ago by selvanair

FWIW, confirmed that the re-instated original default route is deleted by the service as a part of cleanup (undo) when the worker thread exits.

Patch as discussed above sent to the list.

comment:8 Changed 3 years ago by Gert Döring

Resolution: fixed
Status: newclosed

commit 788e5e4a08e0df7206d17e9cbc135764d6fc385f
Author: Selva Nair <selva.nair@…>
Date: Tue Nov 29 19:39:32 2016 -0500

Force 'def1' method when --redirect-gateway is done through service

thanks.

(I admit I have only test-compiled it, and stared at the code, but I trust you that this has been tested so I'll just close the ticket)

comment:9 Changed 3 years ago by ghbmail

Resolution: fixed
Status: closedreopened

I would like this issue reopened, the implemented fix is unacceptable as it breaks a bunch of windows functionality. Specifically, the Windows Network Connectivity Status Indicator service fails if there is no default route attached to the interface, therefore as far as windows is concerned the VPN link has no connectivity and anything which depends on NCSI breaks. For instance, Anything Office 365 related will show account services as "No internet connection" and prevent switching accounts or accessing sharepoint online. Forcing "def1" on redirect-gateway means there is no way to configure a proper default route on the VPN interface to work around this.

comment:10 Changed 3 years ago by selvanair

Hmm.. Wonder why would office 365 care whether there is a default route (or any route for that matter) through the tunnel as long as MS servers are reachable? Can any one else with office 365 or sharepoint try to reproduce this?

Could be a bug in Windows NCSI. What OS version and is it up to date with all patches?

Note: See TracTickets for help on using tickets.