Opened 9 years ago

Last modified 4 years ago

#660 assigned Bug / Defect

Suggested source Code changes to avoid fragmentation by the network stack when using UDP transport

Reported by: john7000 Owned by: Gert Döring
Priority: major Milestone: release 2.3.14
Component: Networking Version: OpenVPN 2.3.10 (Community Ed)
Severity: Not set (select this one, unless your'e a OpenVPN developer) Keywords: over-achiever
Cc:

Description

For some time I have been unable to use OpenVPN over an IPv6 network with UDP as the carrier. The problem arises because:

a/ Fragmented IPv6 packets generated by the OpenVPN server IPv6 stack are blocked at the client-side firewall. (IPv6 fragment reassembly is required prior to deep packet inspection and the memory resource demand can lead to an undesirable DDoS attack surface).

b/ the Internet feeds as implemented by the local ISPs restrict the MTU to 1492 due to the 8 byte PPPoE header (for both native IPv6 and IPv4).

c/ OpenVPN creates payload which is too large for the IPv6 encapsulation because it fails to calculate the maximum header space.

The problems above are most severe for UDP. The kluge using the hardcoded default mssfix=1450 doesn't work for UDP/IPv6 with MTU of 1492.

When using a datagram protocol (UDP) the payload generating program (in this case OpenVPN) expects that the datagrams will be delivered with their implicit boundaries intact. There may be duplicates, missing datagrams, or datagrams out of order but they will still be delivered as single entities with neither concatenation nor fragmentation.

TCP on the other hand is a stream protocol. It does not attempt to guarantee that datagrams are delivered as implicit entities; they may be concatenated into one delivery or delivered in parts. But the bytes will be in order with no duplicates and nothing missing. It is very easy for the network stack of the sending device to perform fragmentation of TCP streams as necessary. For IPv4 (but not IPv6) intervening devices may also perform fragmentation and aggregation of TCP data. However there is still an efficiency cost (extra traffic overhead, reassembly for deep packet inspection, extra CPU load on communication equipment) of performing TCP fragmentation and the best solution is to tell the initial transmitting process the MTU of the entire link so that fragmentation is unnecessary.

So why not just use TCP as the carrier? TCP transport of TCP payload has some nasty behaviour when delays or packet loss occurs because both layers perform backoff and retransmissions and can they interact leading to TCP session failures. UDP carrier is also better for payloads such as RTP/UDP where the latency from retransmission of lost packets is less desirable than simply skipping the data.

I decided to examine the OpenVPN source code to find out where this was failing when using UDP/IPv6 on a link with 1492 MTU and found a number of coding errors. The majority of analysis was performed on OpenVPN 2.3.8 and then applied to and checked against the latest version released (2.3.10). The code tested successfully with IPv6 and IPv4 using the clients

  • Mavericks/Tunnelblick?/OpenVPN 2.3.8,
  • Windows7/MI GUI/OpenVPn2.3.2,
  • Android 4.1.2/OpenVPN0.6.44
  • IOS 8.2 (IPv4 only)

I am not in a position to setup to perform the formal development process so instead I have documented the code changes I made and the theory behind these changes. All analysis and changes were performed using just grep, tcpdump and vi on Centos 6.7. I did not setup any IDE to assist with the process. This document is not very well structured but I thought it better to present it incase any of the developers wishes to make the code changes suggested.

The source code changes are listed at the bottom of this information.

For the purposes of this analysis other relevant conditions are

a/ the transport packets may be either IPv6 or IPv4 but the tests are all with IPv4 payload so that it is easily recognisable and separately routed from the IPv6 transport packets.

b/ The OpenVPN clients and OpenVPN server were on separate ISP feeds - ADSL using PPPoE and Australian NBN fibre (GPON) using PPPoA - with Cisco routers at both ends.

c/ The OpenVPN server had a single ethernet interface ("on a stick") and the test web servers were on the same subnet.

d/ The local LAN router at the server end (also Cisco) redirected any traffic it received addressed to the OpenVPN clients to the OpenVPN server for tunnelling.

e/ ICMP unreachable and ICMPv6 packet-too-big messages were not blocked at any point.

f/ Large test packets were generated by downloading JPG and PNG images which could not be compressed further by the OpenVPN compression process.

g/ OpenVPN's own fragmentation is not enabled

h/ OpenVPN compression is enabled to add the 1 byte header when compression fails.

i/ OpenVPN encryption is enabled

j/ TUN (tunnel) mode is being used operating at layer 3 - not TAP which is a bridging mode operating at layer 2.

k/ Connections used a tls-auth key, a certificate on both client and server and required user authentication thus requiring the additional handshake traffic during session setup.

Before detailing the observations and proposed bug fixes there are some points to make about the naming used in the code and a summary of the steps needed to calculate the packet sizes.

Naming

The OpenVPN function and variable naming fails in places to give a clear indication of the purpose of the items named. The main ones relating to this analysis follow.

Bytes versus Octets

The various block sizes transmitted on the network are stored in unsigned 8-bit bytes (uint8_t) so this document uses the term 'bytes' rather than 'octet'.

Tunnel and TUN.


OpenVPN essentially has two traffic interfaces

  • a raw payload side which communicates with the web server etc. and is commonly referred to as 'TUN' or 'tun' in the source code because it uses the 'tun0' or similar interface.
  • a tunnelled side where the payload is encrypted, compressed etc and the transported using public IP addresses so it is routable across the Internet. This is logically a 'tunnel' and called the 'link' or 'socket' in the source code.

Unfortunately the loose naming convention makes the code unclear in places as to whether the tun side or tunnel side is being referenced.raffic.

struct frame.

'struct frame' is a control structure within OpenVPN that provides a template for the various processes which build the encapsulated payload. Two modes are supported being 'TUN' and 'TAP'. It is confusing that this structure is called 'frame' because in 'TUN' mode the payload encapsulation is of a layer 3 'packet' and not a layer 2 'frame'.

Ideally 'struct frame' should be an object accessed solely by methods or functions but there are direct operations on the attributes/members scattered throughout other modules. One of the code changes proposed requires an extra field (socket_proto) to be added to 'struct frame', populated and later read back. Keeping in the prevalent style of the current code these changes were applied directly to the socket_proto member. Inline macros or functions/methods could be used instead.

Frame, Packet and Datagram size calculations

The narrowest part of the path for payload IP packets which are to be transported through the encrypted tunnel without fragmentation is the tunnel itself. The tunnel's transport packet must fit into the smallest MTU of the layer 2 framing available along the path.

If we take a normal ethernet packet on most modern networks and switches the default MTU is 1500. The ethernet frame is normally 14 bytes longer than this making the on-wire frame 1514 bytes (or octets) though on an 802.1Q VLAN trunk it will be 1518 bytes and metro ethernet (Q-on-Q)1522 bytes. In data centres with 1Gbps or faster media ethernet jumbo frames of around 9000 bytes may be used but many managed ISP connections do not support anything above an MTU of 1500 for customer packets.

Where PPPoE services, and in some cases PPPoA too, are provided by ISPs on ADSL or GPON (fibre), the PPP overhead needs 8 bytes thus reducing the availble MTU to 1492.

Transporting IPv6 over native IPv4 ('6in4', '6to4', '6rd', 'teredo', 'isatap', 'gre') or IPv4 payload over IPv6 will further reduce the MTU. E.g If using IPv6 through an IPv6 over IPv4 tunnel '6-in-4' with IP type 41 to Hurricane Electric needs a further 20 bytes of header so the IPv6 MTU is 1480 (1472 with PPPoE as well).

For the purposes of this document I will use 1492 as the Internet MTU with Native IPv4 and IPv6 to match my native test environment.

OpenVPN calls this MTU the 'link-mtu'. For testing I set this parameter to 1492 - it can't be any bigger.

Normally MTU should be learned and adjusted automatically along the IPv4 or IPv6 path through ICMP unreachable or ICMPv6 packet-too-big messages.

In IPv4 routers fragmentation can be implemented at any router along the path unless the DF (Don't Fragment) bit is set in the header.

IPv6 does not permit routers to perform fragmentation so it is essential that MTU adjustments can occur at the initial transmitting stack and adjusted automatically by routers sending Packet-Too-Big messages in ICMPv6 Unreachable messages. OpenVPN does not appear to directly support these control messages yet but the network stack on the host computer does.

MTU can be reduced further for a particular pair of endpoint addresses. In this case the MTU for the IP address pair is cached in the network stack of the two terminating devices.

Because OpenVPN may not yet understand ICMPv6 packet too big messages assume that the link-mtu configuration parameter is set small enough to not require further dynamic adjustment.

IPv6 has an explicit lower limit for MTU being 1280 bytes (RFC 2460). Any path which has a smaller MTU cannot support IPv6 packets.

With the link MTU in my environment known to be limited to 1492 (as a consequence of the PPPoE header) we can work out the largest raw payload packet that can be transported without fragmentation.

Allow for network IP header

The layer 3 network header will be either IPv4 or IPv6. IPv4 headers are 20 bytes long and under normal circumstances this is constant.

IPv6 headers are nominally 40 bytes long but IPv6 has the concept of extension headers which may add extra length. Fortunately this issue is unlikely to impact OpenVPN transport under normal conditions.

The most likely extension header is the one used for fragmentation when this is performed by the transmitting network stack. If this headfer is needed the layer 3 fragmentation process performed by the stack will take this extra length into account. The IPv6 fragmentation header adds overhead to both of the resulting fragment packets so that the fragments can be recognised, matched and reassembled at the receiving stack and in intervening packet filters and content inspection devices.

Another type of extension header is used for the IPSec encryption built natively into the IPv6 specifications. However with OpenVPN performing its own encryption it is highly unlikely that this header will also be used.

It is possible that the extension headers related to 'IP Mobility' may need to be accommodated at some stage but for the moment I will ignore this possibility.

So on this basis it is reasonably safe to assume that IPv6 header will be just 40 bytes.

OpenVPN obtains the values of 20 for IPv4 and 40 for IPv6 in at least two independent ways.

In proto.h structures are defined for the two header types 'struct openvpn_iphdr' and 'struct openvpn_ipv6hdr'. 'sizeof()' is then used to obtain the two values.

Separately in socket.h a set of IPv[4|6]_[TCP|UDP]_HEADER_SIZE symbolic constant definitions combine hardcoded numbers using 20 and 40 as the IP header part of the values.

Given that in this example we are using IPv6 transport and the MTU is 1492 the maximum space available for the UDP datagram or segment is 1492 - 40 = 1452 bytes. Anything bigger will cause the IPv6 stack to fragment the content and generate two packets - one of 1492 bytes and the other of around 57 bytes

Allow for transport UDP/TCP header

The next header working inwards through the encapsulation is the UDP or TCP header.

The UDP header is a constant 8 bytes.

TCP header is nominally 20 bytes but like IPv6 headers supports extension headers. Most current common network stacks support TCP Window Scaling by default. This is a negotiated feature during the TCP handshake and provides a mechanism to assist with congestion behaviour. The extension header adds a further 12 bytes to the TCP header for the majority of data segments. Note TCP Windows scaling adds 24 bytes to SYN segments and 20 bytes for SYN/ACK segments but as these segments are otherwise relatively short there is no need to allow explicitly for this.

OpenVPN obtains the values of 8 for UDP and 20 for TCP in at least two independent ways.

In proto.h structures are defined for the two header types 'struct openvpn_udphdr' and 'struct openvpn_tcphdr'. 'sizeof()' is then used to obtain the two values 8 and 20. There is no allowance for the common TCP Window Sharing extension.

In socket.h a set of IPv[4|6]_[TCP|UDP]_HEADER_SIZE symbolic constant definitions combine hardcoded numbers using 8 and 20 as part of the values. Again the TCP Window Sharing extension is not included. In the suggested program changes I have change the 20 to 32 to allow for Windows Scaling when using this source of the values.

NB A macro in proto.h also defines a conversion of MTU to MSS for TCP when MSS capping is used on the raw payload. The macro subtracts the sizeof(struct openvpn_iphdr) and sizeof(struct openvpn_tcphdr) from the mtu passed to it - making MSS 40 bytes less than MTU. This is correct for the definition of MSS only when using TCP over IPv4. MSS is defined as the maximum size of the total TCP payload including any TCP extension headers so the value is correct - but only for IPv4.

There is no equivalent function for TCP over IPv6 so the program incorrectly uses the IPv4 header size instead. A second macro should be created but have not included this in the suggested program changes. MSS is not the best way to avoid fragmentation - MTU should be used instead as it works for all transported protocols not just TCP.

Given that in this example we are using UDP over IPv6 transport and the available UDP datagram space is 1452 the maximum space available for the encrypted message is 1452 - 8 = 1444 bytes.

Other headers

A series of calls fetch the relevant header space required for the current configuration resulting in my case as a total length of 58 bytes.

  • compression indicator = 1
  • The pad cipher block = 16 (same as iv_size)
  • The cipher kt mode = 16
  • hmac header = 20
  • ssl flag = 1
  • various alignment allowances

(No explicit OpenVPN fragmentation is being performed.)

Compression is done first. If the result of the compression algorithm makes the data longer then the compression result the latter is ignored and the uncompressed data is used instead. Compression is not effective for pictures etc. which are already highly compressed. It is necessary to signal whether the compression has been applied so a one byte header is added. The worst case will thus be that 1 byte more is needed for the link payload.

The exact numbers of bytes needed for all the headers will vary with the configuration but in the test setup a total 58 bytes were needed. On this basis we now know that the maximum space available for the original payload IP packet is 1444 - 58 = 1386 bytes.

Note TLS authentication packets used between the OpenVPN program on the server and its peer on the client also requires a maximum size calculation using the 'struct frame' template. This calculation may have a different header size depending on the algorithms being used. The authentication process also needs to handle large packets to share certificate related information so must also calculate a maximum size by subtracting the header demand from the available link MTU.

Restricting the payload IP packet size.

We now know that for the configuration of UDP over IPv6 transport, 1492 MTU Internet connection, SSL, LZO, HMAC and various flags that any initial payload greater than 1386 bytes long is going to result in fragmentation at the link stack.

The simplest and most universal way to get this message to the transmitter of the raw payload is to set the MTU on the tunnel (TUN) interface to this value. This is relatively simple as the value can be configured when the TUNx interface is initially created. Everything else will work automatically as long as ICMPv6 Packet-too big messages (and ICMP unreachables for IPv4) can get to, and will be accepted by, the transmitting devices. Whilst there is some reluctance from some network administrators it is poor practice to block ICMP messages of these types. Modern network stacks should be able to resist ICMP attacks. Firewalls have ICMP rate limiting too to kill-off excessive ICMP traffic.

Note OpenVPN places the specified MTU for the link into the 'struct frame' structure as link_mtu. However it uses link_mtu_dynamic as the setting to be used to calculate the MTU on the Tunnel interface. link_mtu_dynamic is initially the same as link_mtu BUT may be made smaller due to other factors such as packet too big messages specific to the IP addresses.

Using MSS as a workaround for poor MTU implementations

MTU handling on some operating systems has been poor and the issue exacerbated by network security administrators who block ICMP unreachable/packet-too-big messages which are an essential control process for MTU feedback.

However for those implementations where MTU feedback via ICMP proves impossible or unacceptable then MSS (Maximum Segment Size) provides a partial solution applicable for TCP ONLY. MSS settings provide no assistance for other protocols such as large UDP packets (DNS/UDP carrying DNSEC certificates or records used for DKIM), AH and ESP protocols as none of these other layer 4 protocols has any concept of MSS.

NOTE by default OpenVPN sets the value of mssfix to 1450. This becomes an upper limit in the payload MTU calculations. In the current code mssfix = 1450 is a workaround for UDP/IPv4 transport only. The setting is too large to support UDP/IPv6.

With the suggested code changes that follow the default hard-coded mssfix setting causes an unnecessarily small packet to be created. The fix is to override the hardcoded workaround by setting mssfix value the same as link-mtu and then allowing the calculations to do their job.

Suggested code changes.

The following code changes have been tested on the equipment I have available to me using two native IPv6 Internet connections with both restricted to 1492 byte MTU.

IPv6 fragmentation headers are blocked at the client end but ICMP and ICMPv6 packet too big is permitted to support PMTUD.
The LANs at both ends run 1500byte MTU.

The OpenVPN server is Centos 6.7 with OpenVPN 2.3.10.

The clients were Mavericks with Tunnelblick and OpenVPN2.3.8, Windows 7 with OpenVPN 2.3.2 and Android 4.2 with the latest available OpenVPN. IOS 8.2 on an iPad does not appear to work correctly with IPv6 (a different problem) but does work correctly with IPv4

  1. Fix HEADERSIZE values in socket.h to allow for Window scaling with TCP (32 bytes instead of 20)

Change
*

  • Overhead added to packets by various protocols. */

#define IPv4_UDP_HEADER_SIZE 28
#define IPv4_TCP_HEADER_SIZE 40
#define IPv6_UDP_HEADER_SIZE 48
#define IPv6_TCP_HEADER_SIZE 60

to
*

  • Overhead added to packets by various protocols. */

#define IPv4_UDP_HEADER_SIZE 28
#define IPv4_TCP_HEADER_SIZE 52
#define IPv6_UDP_HEADER_SIZE 48
#define IPv6_TCP_HEADER_SIZE 72

  1. Fix proto_overhead[] array to match enum proto_num in socket.c. At some stage the proto_enum in socket.h has been modified but the array in socket.c has not.

Change

const int proto_overhead[] = { /* indexed by PROTO_x */

0,
IPv4_UDP_HEADER_SIZE, /* IPv4 */
IPv4_TCP_HEADER_SIZE,
IPv4_TCP_HEADER_SIZE,
IPv6_UDP_HEADER_SIZE, /* IPv6 */
IPv6_TCP_HEADER_SIZE,
IPv6_TCP_HEADER_SIZE,
IPv6_TCP_HEADER_SIZE,

};

to

const int proto_overhead[] = { /* indexed by PROTO_x */

0,
IPv4_UDP_HEADER_SIZE, /* IPv4 */
IPv4_TCP_HEADER_SIZE,
IPv4_TCP_HEADER_SIZE,

IPv4_TCP_HEADER_SIZE,

IPv6_UDP_HEADER_SIZE, /* IPv6 */
IPv6_TCP_HEADER_SIZE,
IPv6_TCP_HEADER_SIZE,
IPv6_TCP_HEADER_SIZE,

};

  1. Modify the 'struct frame' definition in mtu.h so that it can carry the protocol. The reason for this is so that the information is available to the control frame setup when needed. An alternative is to pass the value through quite a few function calls as a parameter. 'struct frame' is used to carry the packet template so it is a fairly natural place to keep track of the layer 3 protocol which does not change after initialisation.

Add this additional entry at the bottom of the definition of struct frame in mtu.h

struct frame {
...
int socket_proto; /* The value is set in initialisation and used for datagram_overhead() calculations */
};

  1. Modify do_init_frame() in init.c by adding these two lines at the top of the function.

The first line ensures that the transport headers (IPv4/IPv6 and TCP/UDP) are included in the calculation instead of relying on the mssfix default value kluge.

The second line copies the protocol into the frame structure so we can get at it elsewhere.

static void
do_init_frame (struct context *c)
{

frame_add_to_extra_frame(&c->c2.frame, datagram_overhead(c->options.ce.proto) );

(&c->c2.frame)->socket_proto = c->options.ce.proto;

#ifdef ENABLE_LZO
...

  1. Modify the initialisation of the control frame which is copied from the data frame.

Modify ssl.c function tls_init_control_channel_frame_parameters().
The first line copies the socket_proto into the control frame and caches a local variable for the second line.

The second line uses the protocol to obtain the additional header length.

/* inherit link MTU and extra_link from data channel */
frame->link_mtu = data_channel_frame->link_mtu;
frame->extra_link = data_channel_frame->extra_link;

int socket_proto = frame->socket_proto = data_channel_frame->socket_proto;


...

frame_add_to_extra_frame (frame, SID_SIZE + sizeof (packet_id_type));

frame_add_to_extra_frame(frame, datagram_overhead(socket_proto));

...

  1. Modify the configuration files by adding the following commands. 1492 is used for the test as this is the reduced MTU caused by PPPoE encapsulation for ADSL and GPON conections

The first line tells OpenVPN about the MTU limitation early enough for it to work properly.
The second line is optional It overides the default value of 1450 for mssfix which is not necessary when the calculations are working. If left at 1450. link-mtu-dynamic will be smaller than necessary and cause MSS messages to be sent to the internal systems to reduce their MSS smalled than needed.

link-mtu 1492
mssfix 1492

Other potential issues that should be examined

  • These changes appear to work when only one end of the link has been modified. In tests the clients were running unmodified earlier versions of OpenVPN. The changes do not appear to lead to any compatibility issues.
  • The startup calculations and MTU adjustment should check that the tunnel interface (e.g. tun0) is not set to a MTU that is less than 1280 which would contravene RFC 2460 for IPv6 minimum MTU.
  • The effects of enabling fragmentation within OpenVPN has not been checked with the modified code. Ideally enabling fragmentation should remove the need to shrink the MTU on the tunnel interface and instead calculate when to fragment and how many fragments - 2 or possibly 3 - then create roughly equals size packets so that they are treated similarly along the path (order they are delivered, path taken, latency variation) by queueing algorithms on intervening switches and routers.
  • Need to determine if OpenVPN understands ICMPv6 packet too big messages so that it can adjust the link MTU if necessary.
  • Need to determine the behaviour where multiple OpenVPN clients are behind a common NAT address and just some of these requires a reduced MTU for the transport. These clients will need to reduce the MTU on the tunnel interface below that required for the other clients on the same tunnel interface.
  • Need to check that MSS continues to operate for those end devices (e.g. web servers) which are configured to ignore MTU settings from ICMP messages. This, of course, only works for TCP payload.

Change History (8)

comment:1 Changed 9 years ago by Gert Döring

Milestone: release 2.3.11
Owner: set to Gert Döring
Status: newassigned

comment:2 Changed 9 years ago by Eric Crist

Keywords: over-achiever added

comment:3 Changed 8 years ago by Samuli Seppänen

Milestone: release 2.3.11release 2.3.12

comment:4 Changed 8 years ago by Gert Döring

Milestone: release 2.3.12release 2.3.14

not (completely) forgotten, but other issues had priority. Will come back.

comment:5 Changed 8 years ago by vom

Any updates for this ? I'd love to enable IPv6 on my server - but when using the hotspot on my phone - fragments get firewalled/filtered.

comment:6 Changed 6 years ago by vom

*Bump*

Any new info or timeline on this ?

comment:7 Changed 5 years ago by vom

Bumping again. Is the IPv6 fragmentation issue addressed somewhere else or in another manner ? If so I'd love to see the details on that (any maybe post them here and close this ticket).

Thanks.

comment:8 Changed 4 years ago by ColoHost

Came to "me too" this one as well. Honestly surprised it has not received more public attention, but I guess that can be chalked up to a lack of IPv6 adoption. In my case, the issue it causes is that on a dual-stack host and client config, Windows-based users can't even authenticate to Microsoft 365. The pop-up dialog box that would normally display the MS 365 login prompt produces a can't connect error. I tracked this back to the attempts being made over IPv6, and a portion of the packets disappearing from being larger than what could be encapsulated. Disabling v6 resolves it, but with v6 enabled, setting tun-mtu and mssfix client-side had no effect on it. Setting the advanced Windows settings on the TAP adapter to have a relevant lower MTU resolved the issue on the IPv6 side.

Note: See TracTickets for help on using tickets.