= Optimizing performance on gigabit networks = It is easily possible to saturate a 100 Mbps network using an OpenVPN tunnel. The throughput of the tunnel will be very close to the throughput of regular network interface. On gigabit networks and faster this is not so easy to achieve. This page explains how to increase the throughput of a VPN tunnel to near-linespeed for a 1 Gbps network. Some initial investigations using a 10 Gbps network are also explained. == Network setup == For this setup several machines were used, all connected to gigabit switches: * two servers running CentOS 5.5 64bit, with an Intel Xeon E5440 CPU running @ 2.83GHz; the L2 cache size is 6 MB. * a server running CentOS 5.5 64bit, with an Intel Xeon X5660 CPU running @ 2.80GHz; the L2 cache size is 12 MB. This CPU has support for the AES-NI instructions. * a laptop running Fedora 14 64bit, with an Intel i5-560M CPU running @ 2.66GHz; the L2 cache size is 3 MB. This CPU also has support for the AES-NI instructions. Before starting, the "raw" network speed was measured using 'iperf'. As expected, '''iperf''' reported consistent numbers around '''940 Mbps''', which is (almost) optimal for a gigabit LAN. The MTU size on all switches in the gigabit LAN was set to 1500. == Understanding the flow of packets == It is important to understand how packets flow from the 'iperf' client via the OpenVPN tunnel to the 'iperf' server. The following diagram helps to clarify the flow: [[Image(OpenVPN-packetflow.png, 800px)]] when an 'iperf' packet is sent to the VPN server IP address, the packet enters the kernel's 'tun0' device. The packet is then forwarded to the userspace OpenVPN process, where the headers are stripped. The packet is encrypted and signed using OpenSSL calls. The encryption and signing algorithms can be configured using the '--cipher' and '--auth' options. The resulting packet is then fragmented into pieces according to the '--fragment' and --mssfix' options. Afterwards, the encrypted packet is sent out over the regular network to the OpenVPN server. On the server, the process is reversed. First, the packet is reassembled, then decrypted and finally sent out the 'tun0' interface. == Standard setup == The default OpenVPN for CentOS 5 currently is 2.1.4; the system OpenSSL version is 0.9.7e. Using a very plain shared secret key setup for both server (listener) {{{ openvpn --dev tun --proto udp --port 11000 --secret secret.key --ifconfig 192.168.222.11 192.168.222.10 }}} and client {{{ openvpn --dev tun --proto udp --port 11000 --secret secret.key --ifconfig 192.168.222.10 192.168.222.11 --remote server }}} an '''iperf''' result of '''156 Mbps''' is obtained. By switching to the cipher '''aes-256-cbc''' the performance drops even further to '''126 Mbps'''. These results were obtained on the two E5440 based servers. == Tweaked setup == The first tweak made was: * increase the MTU size of the tun adapter ('--tun-mtu') to 6000 bytes. This resembles JumboFrames on a regular Ethernet LAN. Note that the MTU size on the underlying network switches was '''not''' altered. * disable OpenVPN's internal fragmentation algorithm using '--fragment 0'. * disable OpenVPN's 'TCP Maximum Segment Size' limitor using '--mssfix 0'. {{{ openvpn --dev tun --proto udp --port 11000 --secret secret.key --ifconfig 192.168.222.11 192.168.222.10 --tun-mtu 6000 --fragment 0 --mssfix 0 }}} and client {{{ openvpn --dev tun --proto udp --port 11000 --secret secret.key --ifconfig 192.168.222.10 192.168.222.11 --remote server --tun-mtu 6000 --fragment 0 --mssfix 0 }}} Now an '''iperf''' result of '''307 Mbps''' is obtained. By playing with the '--tun-mtu' size we obtain (all speeds in Mbps) || MTU || Blowfish || AES256 || || 1500|| 158 || 126 || || 6000|| 307 || 220 || || 9000|| 370 || 249 || ||12000|| 416 || 252 || ||24000|| 466 || 259 || ||36000|| 470 || 244 || ||48000|| '''510''' || 247 || ||60000|| 488 || 221 || (Please note that for all measurement a standard deviation of ~5% applies) For the default Blowfish cipher the optimal value for the 'tun-mtu' parameters for a link between these two servers seems to be '''48000''' bytes. For this 'tun'mtu' setting the VPN throughput increases from 160 Mbps to '''510''' Mbps. Similarly, for the AES-256 cipher the optimal value is '''24000''' bytes. === Explanation === By increasing the MTU size of the tun adapter '''and''' by disabling OpenVPN's internal fragmentation routines the throughput can be increased quite dramatically. The reason behind this is that by feeding larger packets to the OpenSSL encryption and decryption routines the performance will go up. The second advantage of not internally fragmenting packets is that this is left to the operating system and to the kernel network device drivers. For a LAN-based setup this can work, but when handling various types of remote users (road warriors, cable modem users, etc) this is not always a possibility. == Using OpenSSL 1.0.0 with AES-NI patch == The second tweak made was to relink OpenVPN 2.1.4 using the OpenSSL 1.0.0a libraries with the Intel AES-NI patch applied. This patch is included by default in Fedora 12 and higher. The advantage of this patch is that even for CPUs that do not support the native AES-NI instructions the performance of the AES ciphers goes up by a factor of 2. Testing was done similar to the previous tweak {{{ ./openvpn --dev tun --proto udp --port 11000 --secret secret.key --ifconfig 192.168.222.11 192.168.222.10 --tun-mtu 6000 --fragment 0 --mssfix 0 --cipher aes-256-cbc }}} and client {{{ ./openvpn --dev tun --proto udp --port 11000 --secret secret.key --ifconfig 192.168.222.10 192.168.222.11 --remote server --tun-mtu 6000 --fragment 0 --mssfix 0 --cipher aes-256-cbc }}} Now an '''iperf''' result of '''407 Mbps''' is obtained. By playing with the '--tun-mtu' size we obtain (all speeds in Mbps) || MTU || Blowfish || AES256 || || 6000|| 310 || 407 || || 9000|| 385 || 410 || ||12000|| 417 || 478 || ||24000|| 470 || 540 || ||36000|| 510 || 561 || ||48000|| 500 || '''585''' || ||60000|| 500 || 582 || (Please note that for all measurement a standard deviation of ~5% applies) For the default Blowfish cipher the optimal value for the 'tun-mtu' parameters for a link between these two servers now seems to be '''36000''' bytes, although the difference for higher MTU sizes is minimamal. Also note that the performance numbers are nearly identical to those generated using the system OpenSSL 0.9.7e library. When using the AES-256 cipher there is huge performance gain. The optimal MTU value now is '''48000''' bytes, but overall performance increased by a factor of 2 for nearly all MTU sizes. === Explanation === The AES-NI patch for the OpenSSL 1.0 libraries not only increases performance for AES-NI capable CPUs but also for older CPUs. This can also be seen when running {{{ openssl speed -evp aes-256-cbc }}} == Using OpenSSL 1.0.0 with AES-NI capable hardware == The real performance gain is expected from the AES-NI capable hardware, such as the Intel Xeon X5660 i5-560M CPUs. Using the exact same setup as above, but now using {{{ engine aesni }}} and by using {{{ ./openvpn --dev tun --proto udp --port 11000 --secret secret.key --ifconfig 192.168.222.11 192.168.222.10 --tun-mtu 9000 --fragment 0 --mssfix 0 --cipher aes-256-cbc --engine aesni }}} for the server and {{{ ./openvpn --dev tun --proto udp --port 11000 --secret secret.key --ifconfig 192.168.222.10 192.168.222.11 --remote server --tun-mtu 9000 --fragment 0 --mssfix 0 --cipher aes-256-cbc --engine aesni }}} for the client we obtain the following results (all speeds in Mbps): || || AES128 || AES256 || ||X5660 -> i5-560 || 885 || 878 || ||i5-560 -> X5660 || 748 || 543 || (Please note that for all measurement a standard deviation of ~5% applies) Some initial conclusions: * the AES-NI capable server CPU clearly shows a huge performance gain: it is nearly capable of saturating the gigabit network. * it was not possible to run this test with a more capable client CPU. The i5-560M is not capable of keeping up with the X5660, as can be seen from the performance difference between encryption an decryption. In the first entry in the table above the X5660 encrypted all traffic, and the i5-560M decrypted it. The second entry shows the results for the reverse. == For reference: using no encryption == For reference, the above test was also run with encryption and signing disabled: {{{ openvpn --dev tun --proto udp --port 11000 --secret secret.key --ifconfig 192.168.222.11 192.168.222.10 --tun-mtu 9000 --fragment 0 --mssfix 0 --cipher none --auth none }}} and client {{{ openvpn --dev tun --proto udp --port 11000 --secret secret.key --ifconfig 192.168.222.10 192.168.222.11 --remote server --tun-mtu 9000 --fragment 0 --mssfix 0 --cipher none --auth none }}} Now an '''iperf''' result of '''930 Mbps''' is obtained, which shows that performance is not so much limited by the division between kernel-space and user-space processes, but mostly by the encryption and decryption routines as found in the OpenSSL libraries. == 10 Gigabit networks == The "plaintext" test (i.e. encryption and signing disabled) was repeated on two machines connected to a 10 Gbps switch. Again, all results were obtained using 'iperf'. || || speed || ||"raw" || 8.8 Gbps || ||via tun || 1.3 Gbps || This shows the limits of the kernel-space/user-space division that is present in Linux. The performance of OpenVPN over this network is not yet understood, as the raw openssl speed of the computers in this network is much lower than that of servers listed above. Unfortunately, it was not possible to use two AES-NI capable PCs on this network.