Opened 2 years ago

Last modified 4 weeks ago

#603 assigned Bug / Defect

Tunnel latency issues on Windows 7

Reported by: JohnDoe123 Owned by: samuli
Priority: critical Milestone: release 2.3.14
Component: Networking Version: OpenVPN 2.3.8 (Community Ed)
Severity: Not set (select this one, unless your'e a OpenVPN developer) Keywords: tap-windows tap6
Cc: samuli

Description

OpenVPN users on Windows 7 are suffering from high, seemingly "random" tunnel latency.

The problem can be reproduced very easily by following these steps (make sure that all client traffic is routed through the tunnel):

  1. Do a fresh install of Windows 7.
  2. Install the Firefox Web browser.
  3. Establish a connection to an OpenVPN server (protocol does _not_ matter, both UDP and TCP are affected).
  4. Run a ping to any reliable server, I used 8.8.8.8 (google public DNS).
  5. Launch the Firefox web browser.

Ping will spike multiple times for seemingly no reason. This does not only happen when launching the web browser but also on other events. This effectively renders the tunnel unusuable, downloads will slow down to a crawl until they finally time out, etc.

I tried to reproduce this bug on several operating systems and linux distributions using the same config and binaries (Windows 2000, Windows XP, Debian 7, Arch) and didn't notice anything unusual, so it seems that only Windows 7 is affected.

I can provide a packet dump if neccesary.

Change History (30)

comment:1 follow-up: Changed 2 years ago by ValdikSS

downloads will slow down to a crawl until they finally time out

Seems like an MTU issue. What network connection type do you have?

comment:2 in reply to: ↑ 1 Changed 2 years ago by JohnDoe123

Replying to ValdikSS:

downloads will slow down to a crawl until they finally time out

Seems like an MTU issue. What network connection type do you have?

Ethernet, card is an Intel I218-V, MTU set to 1500.

Here's my client config:

proto udp
tun-mtu 1500
fragment 1300
mssfix
cipher AES-256-CBC

remote <host> <port>

auth SHA512
auth-user-pass
client
comp-lzo
dev tun
hand-window 120
inactive 604800
mute-replay-warnings
nobind
ns-cert-type server
persist-key
persist-remote-ip
persist-tun
ping 5
ping-restart 120
redirect-gateway def1
remote-random
reneg-sec 3600
resolv-retry 60
route-delay 2
route-method exe
script-security 2
tls-cipher DHE-RSA-AES256-GCM-SHA384:DHE-RSA-AES256-SHA256:DHE-RSA-AES128-GCM-SHA256:DHE-RSA-AES128-SHA256:DHE-RSA-CAMELLIA256-SHA:DHE-RSA-AES256-SHA:DHE-RSA-CAMELLIA128-SHA:DHE-RSA-AES128-SHA:CAMELLIA256-SHA:AES256-SHA:CAMELLIA128-SHA:AES128-SHA
tls-timeout 5
verb 4

ca <ca>
cert <cert>
key <keyfile>
tls-auth <psk> 1

I don't believe its an MTU issue. This happens universally on multiple lines, behind many modems / routers (ADSL, Broadband, Fiber) and on multiple computers.

Last edited 2 years ago by JohnDoe123 (previous) (diff)

comment:3 follow-up: Changed 2 years ago by ValdikSS

Try to remove tun-mtu and fragment and set mssfix to 1300
Also set buffer sizes to zero on both sides
#461

comment:4 in reply to: ↑ 3 Changed 2 years ago by JohnDoe123

That does not fix it, behavior is still the same.

comment:5 follow-up: Changed 2 years ago by ValdikSS

Are your sure this is not your ISP? Can you see what's happening on the network interface outside of the tunnel with wireshark?

comment:6 in reply to: ↑ 5 Changed 2 years ago by JohnDoe123

Replying to ValdikSS:

Are your sure this is not your ISP? Can you see what's happening on the network interface outside of the tunnel with wireshark?

As stated before, this is not an ISP issue. This happens on the home line where I live, in my university network, at my workplace, and so on. Once again, only Windows 7 is affected.

I will attach a packet dump soon.

comment:7 follow-up: Changed 2 years ago by ValdikSS

Do you have client-connect script or radiusplugin on the server side?

comment:8 in reply to: ↑ 7 Changed 2 years ago by JohnDoe123

Do you have client-connect script or radiusplugin on the server side?

That is not the case, I'm afraid.

Packet dump will take me some time as I won't be home anytime soon.

comment:9 Changed 2 years ago by fthiesse

I can confirm the same issue. Load will cause high latency spikes, even if "load" is of low payload.

I then connected a 2nd client (Linux instead of Windows) which was unaffected by the spikes the windows client was seeing upon load.

I then downgraded to an older version (2.3.0) which fixed the issue. No more spikes when the tunnel is being used.

comment:10 Changed 2 years ago by ValdikSS

PCAP anyone?

comment:11 follow-up: Changed 2 years ago by pplars

Hi, i can confirm this bug exists.

It's simple to reproduce:

  • ping 8.8.8.8
  • open any websites with many resources (e.g. large news sites), in multiple tabs for 'better' results
  • see ping packets dropped or delayed by up to 3 seconds
  • see website loading partially stuck or being delayed (due to packet loss)


Having one single fast download also generates some packet loss, but noticeably less than many small requests.
I did test Windows 7 and 8, only 7 is affected.

This is also not a bug in OpenVPN directly, its a bug in TAP Windows 9.21.1.
Installing TAP Windows 9.9.2 resolves this issue.

What about the PCAP? What exactly do you want? All packets? Inside or outside of tunnel?
For Inside: Wireshark does not seem to recognize the tap device. What to do?

Regards
Lars Mueller

Last edited 2 years ago by pplars (previous) (diff)

comment:12 Changed 2 years ago by cron2

  • Keywords tap-windows tap6 added

cc'ing samuli - tap6 issue?

comment:13 in reply to: ↑ 11 Changed 2 years ago by selvanair

Replying to pplars:

This is also not a bug in OpenVPN directly, its a bug in TAP Windows 9.21.1.
Installing TAP Windows 9.9.2 resolves this issue.

I cannot reproduce this on windows 7 with TAP 9.21. I've seen large latency under load only when the link to the LAN is by wifi which looks normal.

I suppose you are sending all traffic through the tunnel. What happens if bulk of the traffic goes through outside the tunnel with only ping through the tunnel. Does the latency still spike?

comment:14 Changed 2 years ago by pplars

Hi,
No this does not happen if I only send the ping request through the tunnel.
This reliably happens with tap 9.21.1 on all windows 7 system I tested.
Ping 8.8.8.8, open for example spiegel.de, open first 10-15 links in tabs ->see packet loss.
And its not only my personal computers.
I work for a VPN provider, we had a bunch of customers reporting this after 9.21.1 was introduced.
These reports stopped since I deploy tap 9.9.2 on windows 7 systems.

Regards
Lars

comment:15 Changed 2 years ago by ValdikSS

Can confirm this issue, but it's reproducible even with 9.9.2 tap adapter version for me.

comment:16 Changed 2 years ago by pplars

Are you sure?
We only have this reports from customers that use 9.21.1, and I could only reproduce this with 9.21.1 never with 9.9.2

Double check the driver version that is in use, I had this issue that 9.21.1 was still in use by windows after downgrading to 9.9.2.
Check in Device manager->tap adapter properties->driver tab->details. Use the "delete driver" button there to get rid of the driver version, I think this also clears windows device driver cache or something like this. tapinstall.exe does not do this as far as I can tell.

comment:17 Changed 2 years ago by ValdikSS

Yes I'm sure. I'm testing it on Windows XP right now with OpenVPN 2.3.9 and TAP Adapter NDIS5 9.9.2. I get ping +50-150 ms while testing download speed and +20-30 ms while testing upload over TCP. Much better over UDP (+30 while downloading and +10-15 while uploading). I see almost no latency changes on Linux (+1-3 ms in the beginning of speed test).

comment:18 Changed 2 years ago by selvanair

With more stress testing now I too can reporoduce the latency issue. No complete stalling that pplars reported, but the ping latency sporadically shoots up to 1 or 2 seconds (from a nice 17 ms otherwise) and drops packets under heavy traffic with multiple connections. Checked only with 9.21.1. The latency spikes that valdikSS sees with 9.9.2 appears minor compared to this, so probably the tap6-windows7 combination has more issues --- well, I've seen a number of other windows 7 misbehaviours (needs register-dns, buggy when ICS is enabled etc..)

Packet sniffing on the server side shows the pings are received and replied with no significant delay so its surely a client-side problem as pplars stated. To know whether the delay is between OpenVPN to tap or tap to kernel one has to correlate wireshark logs with OpenVPN logs at high verbosity which is hard work.. If any one has a TAP driver built with debug enabled that would help -- not signed by an MS-blessed signtaure is ok.

Last edited 2 years ago by selvanair (previous) (diff)

comment:19 Changed 2 years ago by cron2

  • Cc samuli added

@samuli: can you help with a debug build of tap-windows6 (no signature needed)?

comment:20 Changed 20 months ago by samuli

  • Milestone changed from release 2.3.8 to release 2.3.12

comment:21 Changed 17 months ago by lkraav

Following

comment:22 Changed 14 months ago by cron2

  • Milestone changed from release 2.3.12 to release 2.3.14
  • Owner set to samuli
  • Status changed from new to assigned

comment:23 Changed 11 months ago by icare

Still having high tunnel latency spikes randomly with TAP 9.21.2 on windows 7 64 bit.

If replace it with 9.9.2_3, the spikes are not severe as before and seems more 'stabilize'.

Using OpenVPN 2.4.0 client.

comment:24 Changed 9 months ago by kevinjmorse

We just built two brand new Windows 7 64-bit machines and confirm this issue is present.

We are running Access Server 2.1.3 and OpenVPN Connect 2.1.1.102

Upgrading to Windows 10 fixed the issue.

comment:25 in reply to: ↑ description Changed 6 months ago by 0481142930

Replying to JohnDoe123

I have the same problem as well.
Win 7
Intel 82579V ethernet

Lagg spikes non stop.
Started happening after an update to open vpn windows tap adapter a long time ago, only gotten worse and worse since then.
Totally unuseable as of today, connection lagspikes and dies every third second.
VPN connection stays alive but all traffic in it laggs out.

Switching from newest to older windows tap driver version(9.9.1 or 9.9.2_3) seem to help slightly.
Instead of totally losing 99-100% of all traffic every third second it drops slightly less.
But it's still not useable really.
People that report the old driver is a solution probably just live with huge packet loss and get used to it.
Things does not time out 100% like with newest drivers but you hit the wall all the time. With old drivers there are still the problems: Voice communication laggs and stutter and drop out every 3-5 secounds, torrents laggs out and jojo up and down in speed, websites work for the most part but parts of them load slowly and sometimes not at all.

I tested my win 7 system on both an ADSL and a Fiber connection. I reinstalled everything from drivers, clients and everything i could think of many times, formatted the whole computer and tested it on pretty much vanilla win7 with all updates.
Tried different clients(Viscosity, OVPN) that use the TAP adapter and all have this problem.

Tried different VPN servers, UDP, TCP, different countries, different service providers as well, same problem.

If you want to replicate this try to use this program and let it try to connect to the network and download the blockchain "bitcoin-0.14.1-win64"(The core bitcoin client).
It litterally kills all traffic in the tunnle after I start it, nothing get's through after a few secounds and it does not seem to recover, it's the worst program I used so far even worse then torrent clients.

(This happens on both newest and oldest tap drivers)

Last edited 6 months ago by 0481142930 (previous) (diff)

comment:26 Changed 5 months ago by larsete

Had the very same problem described here, and in my case, installing the 9.9.2 version of the driver fixed it almost completely, but only when using the OpenVPN client, because if I use the native ProtonVPN client, it stills lags a bit in some cases, but it is at least much better than with the 9.21 TAP driver version.

Last edited 5 months ago by larsete (previous) (diff)

comment:27 Changed 3 months ago by 0481142930

I just saw something that I think will explain how the latency issues happens, with short cutoffs in the connection.

The "TAP-Windows Adapter V9" is broken somehow.
I just noticed it changes it's LAN ip to the VPN server, about every other second or so, and it just keep doing it like crazy.

[MAC of TAP-ADAPTER here]changed IP Address 10.128.0.1 to 10.128.140.0 on 2017.09.14 10:39
[MAC of TAP-ADAPTER here]changed IP Address 10.128.140.0 to 10.128.140.1 on 2017.09.14 10:39
[MAC of TAP-ADAPTER here]changed IP Address 10.128.140.1 to 10.128.140.2 on 2017.09.14 10:39
[MAC of TAP-ADAPTER here]changed IP Address 10.128.140.2 to 10.128.140.3 on 2017.09.14 10:39
[MAC of TAP-ADAPTER here]changed IP Address 10.128.140.4 to 10.128.140.6 on 2017.09.14 10:39
[MAC of TAP-ADAPTER here]changed IP Address 10.128.140.6 to 10.128.140.15 on 2017.09.14 10:39
[MAC of TAP-ADAPTER here]changed IP Address 10.128.140.15 to 10.128.140.7 on 2017.09.14 10:39
[MAC of TAP-ADAPTER here]changed IP Address 10.128.140.7 to 10.128.140.8 on 2017.09.14 10:39

And so on... It just keeps on going like this.. Seems to never end switching IP-address.

Sometimes it seems like it switch ip 10 times every second, sometimes about 1 time every second and sometimes less frequent.

In this session I tested, that was about 8 minutes and 30 sec long, it changed ip over 750 times.

Last edited 3 months ago by 0481142930 (previous) (diff)

comment:28 Changed 4 weeks ago by 0481142930

Changed my Ethernet card from Intel to a Realtek.
It did not solve the problem. Lagspikes and total clogging of the internet happens just the same.

It have been problems for many years now, will this ever be solved?

comment:29 Changed 4 weeks ago by 0481142930

Problems can be replicated sometimes by connecting to IP-addresses to transfer data it seems.
Every time I'm connecting to some IPs to download blockchain for bitcoin, connection dies.
I tested about 50 different torrents, starting some of them kills the network when it's connecting to the peer IP's.
Even if limit is set to connect up to low amount of maximum 15 connections, network dies.

It seems TAP adapter somehow lose connection to VPN server and start to change IP, and it just keeps doing that for a short or very very long amount of time.
And therefor network is down meanwhile.
And as soon as it gets back online trying to access that torrent or whatever triggered it to go bananas, it starts to do it again.

But it does not fully kill the connection to server. As VPN program show connection as Online but using the VPN tunnel is impossible.

comment:30 Changed 4 weeks ago by 0481142930

I have done a full packet capture, over 200 seconds of time. From before I start VPN software, startup, connection, start some random torrent traffic, and then the death of VPN traffic in tunnel, and about 40 sec extra while VPN tunnel is "dead" but VPN server is still connected.

I used a network monitor to write down almost exact time when traffic went from 15Mbit/s to 0Mbit/s instantly and then stay at 0Mbit/s.

I see there is thousands of these in wireshark:

157639 154.958686 VPN-serverIP MyIP IPv4 1514 Fragmented IP protocol (proto=UDP 17, off=0, ID=a985) [Reassembled in #157640]

157640 154.958687 VPN-serverIP MyIP OpenVPN 60 MessageType?: P_DATA_V1

Packet line 1:
Number=157639
Time=154.958686
From server IP to my IP
Protocol: IPv4
Length (bytes)= 1514
Info= Fragmented IP protocol (proto=UDP 17, off=0, ID=9ac3)[Reassembled in #149725]

Packet line 2:
Number=157640
Time=154.958687
From server IP to my IP
Protocol: OpenVPN
Length (bytes)= 60
Info= MessageType?: P_DATA_V1

Sometimes the length of the OpenVPN packates after are 60, then another one either from 60-200 length or 1461 length.
And after that another one of those Fragmented IP protocol packets comes.

This happens thousands and thousands of times all over the time traffic goes through tunnel.
It comes in very high bursts sometimes and sometimes sporadic.

It happens all the time, but at the point where VPN tunnel dies it seems they are just enough frequent to kill the connection totally.
I believe this is the sign of what ever cause the lag, stutter and the latency problems.

In the list in wireshark these show up as white lines, if you look at all traffic these white lines is like 30-50% of all packets like a zebra pattern over the whole time from when I start to send data through VPN until it dies.

For some seconds(ecspecially at time when tunnel dies) only this pattern of packets listed above is repeated about 8000 times every second.
There are some pockets of OpenVPN packets that look normal but then bursts of this Fragmented IP Protocol packets come again.

After tunnel died, these packets show up, maybe 20-30 times over the next 2-5 seconds, and after that they don't come any more and it's only OpenVPN packets. And traffic in tunnel is not going through.

Is this helpful?
Samuli could you perhaps reply here if you need this packet capture in full to see what happened to track this bug?

It's 200 sec long, 107MB large.

I have written down time in capture log where the tunnel dies.

Note: See TracTickets for help on using tickets.