Opened 20 months ago

Last modified 18 months ago

#1086 new Bug / Defect

Routing is broken: ip addr commands do not take effect

Reported by: kuklinistvan Owned by:
Priority: critical Milestone: release 2.4.5
Component: Generic / unclassified Version: OpenVPN 2.4.5 (Community Ed)
Severity: Not set (select this one, unless your'e a OpenVPN developer) Keywords:
Cc: eworm, David Sommerseth, Antonio

Description

Manually executing the ip addr commands found in the log after the connection has been established sometimes fixes the problems caused by the missing routing entries. Judging by the "missing Nexthop" related errors these commands do run but for some reason they do not make a change.

This bug is well described here, and affects multiple people:
https://forums.openvpn.net/viewtopic.php?t=25771

Change History (12)

comment:1 Changed 20 months ago by Gert Döring

Cc: eworm added

"multiple people" is a slight exaggeration, it seems to affect two - while it works nicely for everyone else on linux, including our automated test environment which runs on a number of different linux distributions and tests all these cases.

Are you using openvpn from your distro, or unmodified from source? Arch Linux tends to add "enhancements", which we cannot debug upstream.

comment:2 Changed 20 months ago by kuklinistvan

It appears here as well.
https://github.com/kylemanna/docker-openvpn/issues/370
And sorry for my English, I meant "more than one independently" by multiple.

I've built the package from AUR:
https://aur.archlinux.org/packages/openvpn-pkcs11/

As you see it contains no patches or "enchantments"; but to be sure I've just compiled the client from upstream source (https://swupdate.openvpn.org/community/releases/openvpn-2.4.6.tar.gz) and the same error happens. (it is version 2.4.6 source, not 2.4.5 as in the AUR, but the situation is the same)

I've compiled it with ./configure, make, sudo make install

[kuklinistvan@pisti-ws openvpn-2.4.6]$ openvpn --version
OpenVPN 2.4.6 x86_64-pc-linux-gnu [SSL (OpenSSL)] [LZO] [LZ4] [EPOLL] [PKCS11] [MH/PKTINFO] [AEAD] built on Aug 19 2018

Even if it is a fault on my side it is strange that it just silently fails.

comment:3 Changed 20 months ago by Gert Döring

Cc: David Sommerseth Antonio added

Thanks for the link over to the CoreOS issue. This is very mysterious indeed - these are all synchronous calls, read: when they return, the device has to be ready.

Of course we can add a sleep() call after tun device creation, but this would be a funny workaround.

In other words, I consider this an OS bug in "bleeding edge Linux" (Arch and this docker stuff). Maybe David knows more about this, and maybe it's already fixed in an even more recent version... @dazo: any ideas?

comment:4 Changed 20 months ago by tincantech

cc

comment:5 Changed 20 months ago by eworm

Just for the record... Arch Linux tends to *NOT* add "enhancements" but uses plain upstream source where possible. Both, openvpn and iproute (which contains ip command) do not have any enhancements.

I have not seen this myself...

comment:6 Changed 20 months ago by Gert Döring

@eworm: Apologies if I misunderstood. I thought you did apply patches that have been on the list but not merged yet, like the systemd/capability enhancement stuff?

(But anyway, this is outside openvpn, so either it is something funny in ip [which I find unlikely] or in udev/systemd/kernel land...)

comment:7 in reply to:  6 ; Changed 20 months ago by eworm

Replying to Gert Döring:

@eworm: Apologies if I misunderstood. I thought you did apply patches that have been on the list but not merged yet, like the systemd/capability enhancement stuff?

I built test packages for myself, but this kind of changes is nothing to push to official repositories. Even the netlink changes from master have to wait. ;)

The last series of patches I applied was for openssl, simply because we had a recent openssl 1.1.0 in our repositories before openvpn supported it.

(But anyway, this is outside openvpn, so either it is something funny in ip [which I find unlikely] or in udev/systemd/kernel land...)

Sounds like kernel is involved here. But wondering why I did not see it myself...

The first report from forum does not give any detail what distribution or kernel is used, no?

comment:8 in reply to:  7 Changed 20 months ago by Gert Döring

Replying to eworm:

Sounds like kernel is involved here. But wondering why I did not see it myself...

This is a good question indeed.

The first report from forum does not give any detail what distribution or kernel is used, no?

No, what we have so far is not really much detail to work with.

One report "it happens on Arch", one report "it happened on CoreOS after I upgraded" (in the other forum post, which contained the workaround with "add a sleep in the 'ip' wrapper script"), one report without specifics.

@uklinistvan: which kernel version are you running ("uname -a")?

comment:9 Changed 20 months ago by kuklinistvan

This is interesting. I was unable to reproduce the bug in a fresh Arch virtual machine - however, the instance on my pc is also up to date and was installed this month.

I've just done another upgrade and tried using openvpn again; it still doesn't work and now my kernel version is

kuklinistvan@pisti-ws ~ % uname -a
Linux pisti-ws 4.14.63-1-lts #1 SMP Wed Aug 15 19:47:55 CEST 2018 x86_64 GNU/Linux

Version 0, edited 20 months ago by kuklinistvan (next)

comment:10 Changed 19 months ago by kuklinistvan

Update:
I've recently had some trouble with systemd-networkd and I've replaced it with NetworkManager?. It seems that the issue has solved itself on may machine, routing works now.

comment:11 Changed 18 months ago by Gert Döring

So can I summarize this as "under some circumstances, systemd-networkd and openvpn do not work well together"?

@eworm: do you use systemd-networkd?

(I'm tempted to close, since this is really out of our sphere of influence - but it is good to have it documented here in case someone else runs into this9

comment:12 Changed 18 months ago by eworm

I do, but with a proper configuration. ;)

Some sources propose to configure systemd-networkd in a way to match all interfaces (and starting dhcp client or whatever). First step is to run flush on the interface, that could explain why the reporter does not see the expected addresses and routes.

Note: See TracTickets for help on using tickets.