Opened 21 months ago

Last modified 19 months ago

#1070 new Bug / Defect

IPv6 packet received by L2 client breaks MAC mapping?

Reported by: Chris_DB Owned by:
Priority: major Milestone:
Component: Networking Version: OpenVPN 2.4.4 (Community Ed)
Severity: Not set (select this one, unless your'e a OpenVPN developer) Keywords: ipv6 tap l2
Cc:

Description

Topology:
[ client ] -- L2/TAP tunnel -- [ server ] -- L2 link -- [ host ]

Issue flow:

  1. Initially traffic works as intended
  2. After a while server sends an IPv6 multicast to client (packet No. 26773)
  3. After this packet, all traffic designated for server is no longer routed via the tunnel. Basically, any packet written in the TAP interface at the client side with the server MAC as the destination, it's spewed back out in the TAP interface by the OpenVPN. It is not sent over the tunnel. However, any packets with a destination different than server's MAC is routed correctly. So client can still communicate with host, but not with server.

The packet capture was taken on client's TAP interface.
In the packet capture, client is 172.16.10.5, server is 172.16.10.1 and host is 172.16.10.20.
If you look closely in the packet capture, you'll notice that after the IPv6 packet No. 26773 (the only IPv6 packet), all ICMP replies (or any traffic) from client to server appear twice, once when going from TAP interface to OpenVPN and once when it is immediately spewed back out by OpenVPN.

Basically OpenVPN will throw packets back on the TAP interface if the destination MAC matches the source of a previously received IPv6 packet.
Or once an IPv6 packet is received, all traffic targeting the source MAC of that packet won't be send over the tunnel, and instead is spewed back out on the TAP interface.

root@ubuntu:~# openvpn --version
OpenVPN 2.4.4 x86_64-pc-linux-gnu [SSL (OpenSSL)] [LZO] [LZ4] [EPOLL] [PKCS11] [MH/PKTINFO] [AEAD] built on Feb 10 2018
library versions: OpenSSL 1.1.0g  2 Nov 2017, LZO 2.08
Originally developed by James Yonan
Copyright (C) 2002-2017 OpenVPN Technologies, Inc. <sales@openvpn.net>
Compile time defines: enable_async_push=no enable_comp_stub=no enable_crypto=yes enable_crypto_ofb_cfb=yes enable_debug=yes enable_def_auth=yes enable_dependency_tracking=no enable_dlopen=unknown enable_dlopen_self=unknown enable_dlopen_self_static=unknown enable_fast_install=needless enable_fragment=yes enable_iproute2=yes enable_libtool_lock=yes enable_lz4=yes enable_lzo=yes enable_maintainer_mode=no enable_management=yes enable_multihome=yes enable_pam_dlopen=no enable_pedantic=no enable_pf=yes enable_pkcs11=yes enable_plugin_auth_pam=yes enable_plugin_down_root=yes enable_plugins=yes enable_port_share=yes enable_selinux=no enable_server=yes enable_shared=yes enable_shared_with_static_runtimes=no enable_silent_rules=no enable_small=no enable_static=yes enable_strict=no enable_strict_options=no enable_systemd=yes enable_werror=no enable_win32_dll=yes enable_x509_alt_username=yes with_aix_soname=aix with_crypto_library=openssl with_gnu_ld=yes with_mem_check=no with_sysroot=no
root@ubuntu:~#

Attachments (2)

ipv6.pcap (43.0 KB) - added by Chris_DB 21 months ago.
Packet capture
ipv6_crash.pcap (43.0 KB) - added by Chris_DB 21 months ago.
Packet capture on client's TAP interface

Download all attachments as: .zip

Change History (15)

Changed 21 months ago by Chris_DB

Attachment: ipv6.pcap added

Packet capture

Changed 21 months ago by Chris_DB

Attachment: ipv6_crash.pcap added

Packet capture on client's TAP interface

comment:1 Changed 21 months ago by Chris_DB

Sorry for the double attachment. I cannot delete the duplicate. Also I cannot modify the bug description. The IPv6 packet number is not 26773, is 99, because I could not attach the original packet capture which was too large, so I had to trim it down.

comment:2 Changed 21 months ago by Antonio

Hi there, IPv6 is widely used in OpenVPN (also in TAP mode) and we have never received any report similar to yours, therefore I think there must be something else in conjunction with that IPv6 packet that is triggering the misbehaviour.

Still, I quickly tried to reproduce the issue here by pinging ff02::2 from the server, but nothing wrong happened. However in my case there were no bridges involved.
Do you have any bridge configured on the client side enslaving the TAP interface?

Can you confirm that on the server you stop seeing any traffic from this client (except that directed to other VPN clients)?

Would you mind sharing server and client config, so that we can have a better understanding of the setup?

Thanks!

comment:3 Changed 21 months ago by Gert Döring

I have a suspicion - looking at the OpenVPN *Server* log file might give some insights when it tells about MAC addresses learned on client instances.

My suspicion is like this: this "multicast packet" is sent *back* from the client towards the server (for whatever reason, it should not do that) and the server then learns "oh, the source MAC of that machine has moved behind this client instance" (learning bridge), so all packets for the server MAC get forwarded out to this client from the on.

There is a bug (some other trac ticket, too lazy to search right now) that the openvpn server will not properly re-learn "MAC is on the tap interface, not on a client instance" - otherwise this would be "fixed" with the next packet from the server.

To confirm -> server log.

If that's indeed true, then there are two bugs - the client should not be sending back the multicast packet in the first place, and the server should handle this more gracefully.

comment:4 Changed 21 months ago by Gert Döring

Not sure I understand the topology right, though... packets 93-95 show a ssh session between 172.16.10.1 and 10.254.10.122 - where is that?

The capture only shows the multicast packet going one way only, but that does not signify anything in particular, unfortunately (tcpdump on tap is funny sometimes). So, looking at client and server logs with --verb 4 when this happens would be good.

comment:5 Changed 21 months ago by Antonio

@Chris_DB any update on this? would you be able to provide the requested information?

comment:6 in reply to:  2 ; Changed 21 months ago by Chris_DB

Sorry for the delay. I did not receive any notifications until this last comment.

Replying to Antonio:

Do you have any bridge configured on the client side enslaving the TAP interface?

Can you confirm that on the server you stop seeing any traffic from this client (except that directed to other VPN clients)?

Would you mind sharing server and client config, so that we can have a better understanding of the setup?

Thanks!

@Antonio
Yes, the TAP interfaces at both client and server are enslaved by a bridge.
Yes, no traffic was received from client on server after IPv6 packet.
I will attempt to reproduce the issue with a minimal setup and get back to you.

Topology looks something like this.

/                /- tap0   \ -- L2/TAP tunnel -- /   tap1 -\   server    \
| client - br0 -<-- tap1   | -- L2/TAP tunnel -- |   tap2 -->- br0 - tap0 | -- L2/TAP tunnel -- [ tap1 - br0 (no STP) - host]
\                \- tap2   / -- L2/TAP tunnel -- \   tap3 -/             /

Replying to Antonio:

Not sure I understand the topology right, though... packets 93-95 show a ssh session between 172.16.10.1 and 10.254.10.122 - where is that?

10.254.10.122 is behind client (172.16.10.5).

Replying to Antonio:

The capture only shows the multicast packet going one way only, but that does not signify anything in particular, unfortunately (tcpdump on tap is funny sometimes).

I'd say the packet capture is accurate. The IPv6 multicast (RA) is coming from server. There is no reply expected from client.
Packet capture was done on tap0 in the diagram above.

Server config:

dev tap1
proto udp
port 1001
ncp-ciphers AES-128-GCM
mode server
tls-server
ccd-exclusive
client-config-dir "/etc/openvpn/client-config-dir"
keepalive 10 30
client-to-client
log "/etc/openvpn/VPN Server HA1.log"
verb 3

<ca>
...
</ca>
<cert>
...
</cert>
<key>
...
</key>
<dh>
...
</dh>

Client config:

dev tap0
proto udp
client
remote-cert-tls server
local 192.168.10.6
remote <remote> 1001
connect-retry 60
log "/etc/openvpn/VPN HQ Gateway HA1.log"
verb 3

<ca>
...
</ca>
<cert>
...
</cert>
<key>
...
</key>

comment:7 Changed 21 months ago by Chris_DB

@Gert Döring
Replying to Gert Döring:

The capture only shows the multicast packet going one way only, but that does not signify anything in particular, unfortunately (tcpdump on tap is funny sometimes). So, looking at client and server logs with --verb 4 when this happens would be good.

Oh, I see what you are saying. There are 2 packet captures attached, ipv6.pcap and ipv6_crash.pcap. Ignore ipv6.pcap (I messed up and cannot remove it) and use ipv6_crash.pcap. There you will see packets both ways. Even the packets that are relayed back on tap by openvpn, after crash.

comment:8 in reply to:  6 ; Changed 21 months ago by Gert Döring

Replying to Chris_DB:

Topology looks something like this.

/                /- tap0   \ -- L2/TAP tunnel -- /   tap1 -\   server    \
| client - br0 -<-- tap1   | -- L2/TAP tunnel -- |   tap2 -->- br0 - tap0 | -- L2/TAP tunnel -- [ tap1 - br0 (no STP) - host]
\                \- tap2   / -- L2/TAP tunnel -- \   tap3 -/             /

This config is a bit on the complex side...

Is STP doing the right thing on the client side? That is, disable tap1 + tap2? Otherwise br0 would be receiving the multicast packets on one of the tap interfaces, and forward it out on the other taps (that's what multicast does "out on all active interfaces except the ingress interface"), confusing the server side (because the source MAC is suddenly seen not on the LAN but on a client socket).

Onwards in my next reply...

comment:9 in reply to:  7 Changed 21 months ago by Gert Döring

Replying to Chris_DB:

@Gert Döring
Replying to Gert Döring:

The capture only shows the multicast packet going one way only, but that does not signify anything in particular, unfortunately (tcpdump on tap is funny sometimes). So, looking at client and server logs with --verb 4 when this happens would be good.

Oh, I see what you are saying. There are 2 packet captures attached, ipv6.pcap and ipv6_crash.pcap. Ignore ipv6.pcap (I messed up and cannot remove it) and use ipv6_crash.pcap. There you will see packets both ways. Even the packets that are relayed back on tap by openvpn, after crash.

You might be "seeing" what I'm saying, but you're missing the essential bit: we need to see a log file from the server when this happens, where the server logs iroute learning - something like this:

Jun 18 18:19:35 gentoo tap-udp-p2mp[28004]: freebsd-74-amd64/194.97.140.3 MULTI: Learn: 00:bd:c1:e3:99:00 -> freebsd-74-amd64/194.97.140.3

Also, the statement "the tcpdump is accurate" is dangerous. This is multicast and a tun/tap interface - strange and wonderful things have been observed and would not be surprised in the least to see a packet come in and be relayed back out again (by the kernel) without telling tcpdump. But it could very well be "come in via tap0, leave via tap1+tap2". So: server logs.

As a somewhat unrelated side note: if this is a point-to-point setup client<->server and not "multiple clients connecting to server:tap1", you could change this into a true point-to-point setup (get rid of "mode server") and you won't have MAC learning inside OpenVPN anymore, getting that part out of the picture. But of course that will also disable --client-config-dir and support for pushing configs to the client - you'll need to have everything static on the client in that mode.

}}}

comment:10 in reply to:  8 Changed 21 months ago by Chris_DB

Replying to Gert Döring:

Is STP doing the right thing on the client side? That is, disable tap1 + tap2? Otherwise br0 would be receiving the multicast packets on one of the tap interfaces, and forward it out on the other taps (that's what multicast does "out on all active interfaces except the ingress interface"), confusing the server side (because the source MAC is suddenly seen not on the LAN but on a client socket).

Onwards in my next reply...

Yes STP works well, tap1 and tap2 are in blocking mode.
In fact, since I disabled IPv6 in server's kernel, no more crashes has happened.

Replying to Gert Döring:

So: server logs.

I'll attempt to reproduce the problem this weekend and come back with minimal setup and logs.
I couldn't notice anything relevant in the verb 3 logs during crash, but I'll try higher.

comment:11 Changed 20 months ago by Chris_DB

I could not reproduce it anymore.
I have re-enabled IPv6 in my infrastructure and added this monitoring command tshark -i tap0 -Y ipv6||arp||tcp.analysis.retransmission on all tap interfaces, along with verb 4 logs. If I manage to get more info, and constantly reproduce the issue, I'll come back.

comment:12 Changed 19 months ago by Antonio

Hey Chris, any chance that you managed to reproduce this issue?

comment:13 in reply to:  12 Changed 19 months ago by Chris_DB

Since then it happened once, but I couldn't manage to collect any data.

Note: See TracTickets for help on using tickets.