Opened 9 years ago
Last modified 5 years ago
#603 assigned Bug / Defect
Tunnel latency issues on Windows 7
Reported by: | JohnDoe123 | Owned by: | Samuli Seppänen |
---|---|---|---|
Priority: | critical | Milestone: | release 2.3.14 |
Component: | Networking | Version: | OpenVPN 2.3.8 (Community Ed) |
Severity: | Not set (select this one, unless your'e a OpenVPN developer) | Keywords: | tap-windows tap6 |
Cc: | Samuli Seppänen, tct |
Description
OpenVPN users on Windows 7 are suffering from high, seemingly "random" tunnel latency.
The problem can be reproduced very easily by following these steps (make sure that all client traffic is routed through the tunnel):
- Do a fresh install of Windows 7.
- Install the Firefox Web browser.
- Establish a connection to an OpenVPN server (protocol does _not_ matter, both UDP and TCP are affected).
- Run a ping to any reliable server, I used 8.8.8.8 (google public DNS).
- Launch the Firefox web browser.
Ping will spike multiple times for seemingly no reason. This does not only happen when launching the web browser but also on other events. This effectively renders the tunnel unusuable, downloads will slow down to a crawl until they finally time out, etc.
I tried to reproduce this bug on several operating systems and linux distributions using the same config and binaries (Windows 2000, Windows XP, Debian 7, Arch) and didn't notice anything unusual, so it seems that only Windows 7 is affected.
I can provide a packet dump if neccesary.
Change History (35)
comment:1 follow-up: 2 Changed 9 years ago by
comment:2 Changed 9 years ago by
Replying to ValdikSS:
downloads will slow down to a crawl until they finally time out
Seems like an MTU issue. What network connection type do you have?
Ethernet, card is an Intel I218-V, MTU set to 1500.
Here's my client config:
proto udp tun-mtu 1500 fragment 1300 mssfix cipher AES-256-CBC remote <host> <port> auth SHA512 auth-user-pass client comp-lzo dev tun hand-window 120 inactive 604800 mute-replay-warnings nobind ns-cert-type server persist-key persist-remote-ip persist-tun ping 5 ping-restart 120 redirect-gateway def1 remote-random reneg-sec 3600 resolv-retry 60 route-delay 2 route-method exe script-security 2 tls-cipher DHE-RSA-AES256-GCM-SHA384:DHE-RSA-AES256-SHA256:DHE-RSA-AES128-GCM-SHA256:DHE-RSA-AES128-SHA256:DHE-RSA-CAMELLIA256-SHA:DHE-RSA-AES256-SHA:DHE-RSA-CAMELLIA128-SHA:DHE-RSA-AES128-SHA:CAMELLIA256-SHA:AES256-SHA:CAMELLIA128-SHA:AES128-SHA tls-timeout 5 verb 4 ca <ca> cert <cert> key <keyfile> tls-auth <psk> 1
I don't believe its an MTU issue. This happens universally on multiple lines, behind many modems / routers (ADSL, Broadband, Fiber) and on multiple computers.
comment:3 follow-up: 4 Changed 9 years ago by
Try to remove tun-mtu and fragment and set mssfix to 1300
Also set buffer sizes to zero on both sides
#461
comment:5 follow-up: 6 Changed 9 years ago by
Are your sure this is not your ISP? Can you see what's happening on the network interface outside of the tunnel with wireshark?
comment:6 Changed 9 years ago by
Replying to ValdikSS:
Are your sure this is not your ISP? Can you see what's happening on the network interface outside of the tunnel with wireshark?
As stated before, this is not an ISP issue. This happens on the home line where I live, in my university network, at my workplace, and so on. Once again, only Windows 7 is affected.
I will attach a packet dump soon.
comment:7 follow-up: 8 Changed 9 years ago by
Do you have client-connect script or radiusplugin on the server side?
comment:8 Changed 9 years ago by
Do you have client-connect script or radiusplugin on the server side?
That is not the case, I'm afraid.
Packet dump will take me some time as I won't be home anytime soon.
comment:9 Changed 9 years ago by
I can confirm the same issue. Load will cause high latency spikes, even if "load" is of low payload.
I then connected a 2nd client (Linux instead of Windows) which was unaffected by the spikes the windows client was seeing upon load.
I then downgraded to an older version (2.3.0) which fixed the issue. No more spikes when the tunnel is being used.
comment:11 follow-up: 13 Changed 9 years ago by
Hi, i can confirm this bug exists.
It's simple to reproduce:
- ping 8.8.8.8
- open any websites with many resources (e.g. large news sites), in multiple tabs for 'better' results
- see ping packets dropped or delayed by up to 3 seconds
- see website loading partially stuck or being delayed (due to packet loss)
Having one single fast download also generates some packet loss, but noticeably less than many small requests.
I did test Windows 7 and 8, only 7 is affected.
This is also not a bug in OpenVPN directly, its a bug in TAP Windows 9.21.1.
Installing TAP Windows 9.9.2 resolves this issue.
What about the PCAP? What exactly do you want? All packets? Inside or outside of tunnel?
For Inside: Wireshark does not seem to recognize the tap device. What to do?
Regards
Lars Mueller
comment:13 Changed 9 years ago by
Replying to pplars:
This is also not a bug in OpenVPN directly, its a bug in TAP Windows 9.21.1.
Installing TAP Windows 9.9.2 resolves this issue.
I cannot reproduce this on windows 7 with TAP 9.21. I've seen large latency under load only when the link to the LAN is by wifi which looks normal.
I suppose you are sending all traffic through the tunnel. What happens if bulk of the traffic goes through outside the tunnel with only ping through the tunnel. Does the latency still spike?
comment:14 Changed 9 years ago by
Hi,
No this does not happen if I only send the ping request through the tunnel.
This reliably happens with tap 9.21.1 on all windows 7 system I tested.
Ping 8.8.8.8, open for example spiegel.de, open first 10-15 links in tabs ->see packet loss.
And its not only my personal computers.
I work for a VPN provider, we had a bunch of customers reporting this after 9.21.1 was introduced.
These reports stopped since I deploy tap 9.9.2 on windows 7 systems.
Regards
Lars
comment:15 Changed 9 years ago by
Can confirm this issue, but it's reproducible even with 9.9.2 tap adapter version for me.
comment:16 Changed 9 years ago by
Are you sure?
We only have this reports from customers that use 9.21.1, and I could only reproduce this with 9.21.1 never with 9.9.2
Double check the driver version that is in use, I had this issue that 9.21.1 was still in use by windows after downgrading to 9.9.2.
Check in Device manager->tap adapter properties->driver tab->details. Use the "delete driver" button there to get rid of the driver version, I think this also clears windows device driver cache or something like this. tapinstall.exe does not do this as far as I can tell.
comment:17 Changed 9 years ago by
Yes I'm sure. I'm testing it on Windows XP right now with OpenVPN 2.3.9 and TAP Adapter NDIS5 9.9.2. I get ping +50-150 ms while testing download speed and +20-30 ms while testing upload over TCP. Much better over UDP (+30 while downloading and +10-15 while uploading). I see almost no latency changes on Linux (+1-3 ms in the beginning of speed test).
comment:18 Changed 9 years ago by
With more stress testing now I too can reporoduce the latency issue. No complete stalling that pplars reported, but the ping latency sporadically shoots up to 1 or 2 seconds (from a nice 17 ms otherwise) and drops packets under heavy traffic with multiple connections. Checked only with 9.21.1. The latency spikes that valdikSS sees with 9.9.2 appears minor compared to this, so probably the tap6-windows7 combination has more issues --- well, I've seen a number of other windows 7 misbehaviours (needs register-dns, buggy when ICS is enabled etc..)
Packet sniffing on the server side shows the pings are received and replied with no significant delay so its surely a client-side problem as pplars stated. To know whether the delay is between OpenVPN to tap or tap to kernel one has to correlate wireshark logs with OpenVPN logs at high verbosity which is hard work.. If any one has a TAP driver built with debug enabled that would help -- not signed by an MS-blessed signtaure is ok.
comment:19 Changed 9 years ago by
Cc: | Samuli Seppänen added |
---|
@samuli: can you help with a debug build of tap-windows6 (no signature needed)?
comment:20 Changed 8 years ago by
Milestone: | release 2.3.8 → release 2.3.12 |
---|
comment:22 Changed 8 years ago by
Milestone: | release 2.3.12 → release 2.3.14 |
---|---|
Owner: | set to Samuli Seppänen |
Status: | new → assigned |
comment:23 Changed 8 years ago by
Still having high tunnel latency spikes randomly with TAP 9.21.2 on windows 7 64 bit.
If replace it with 9.9.2_3, the spikes are not severe as before and seems more 'stabilize'.
Using OpenVPN 2.4.0 client.
comment:24 Changed 7 years ago by
We just built two brand new Windows 7 64-bit machines and confirm this issue is present.
We are running Access Server 2.1.3 and OpenVPN Connect 2.1.1.102
Upgrading to Windows 10 fixed the issue.
comment:25 Changed 7 years ago by
Replying to JohnDoe123
I have the same problem as well.
Win 7
Intel 82579V ethernet
Lagg spikes non stop.
Started happening after an update to open vpn windows tap adapter a long time ago, only gotten worse and worse since then.
Totally unuseable as of today, connection lagspikes and dies every third second.
VPN connection stays alive but all traffic in it laggs out.
Switching from newest to older windows tap driver version(9.9.1 or 9.9.2_3) seem to help slightly.
Instead of totally losing 99-100% of all traffic every third second it drops slightly less.
But it's still not useable really.
People that report the old driver is a solution probably just live with huge packet loss and get used to it.
Things does not time out 100% like with newest drivers but you hit the wall all the time. With old drivers there are still the problems: Voice communication laggs and stutter and drop out every 3-5 secounds, torrents laggs out and jojo up and down in speed, websites work for the most part but parts of them load slowly and sometimes not at all.
I tested my win 7 system on both an ADSL and a Fiber connection. I reinstalled everything from drivers, clients and everything i could think of many times, formatted the whole computer and tested it on pretty much vanilla win7 with all updates.
Tried different clients(Viscosity, OVPN) that use the TAP adapter and all have this problem.
Tried different VPN servers, UDP, TCP, different countries, different service providers as well, same problem.
comment:26 Changed 7 years ago by
Had the very same problem described here, and in my case, installing the 9.9.2 version of the driver fixed it almost completely, but only when using the OpenVPN client, because if I use the native ProtonVPN client, it stills lags a bit in some cases, but it is at least much better than with the 9.21 TAP driver version.
comment:27 Changed 7 years ago by
I just saw something that I think will explain how the latency issues happens, with short cutoffs in the connection.
The "TAP-Windows Adapter V9" is broken somehow.
I just noticed it changes it's LAN ip to the VPN server, about every other second or so, and it just keep doing it like crazy.
[MAC of TAP-ADAPTER here]changed IP Address 10.128.0.1 to 10.128.140.0 on 2017.09.14 10:39
[MAC of TAP-ADAPTER here]changed IP Address 10.128.140.0 to 10.128.140.1 on 2017.09.14 10:39
[MAC of TAP-ADAPTER here]changed IP Address 10.128.140.1 to 10.128.140.2 on 2017.09.14 10:39
[MAC of TAP-ADAPTER here]changed IP Address 10.128.140.2 to 10.128.140.3 on 2017.09.14 10:39
[MAC of TAP-ADAPTER here]changed IP Address 10.128.140.4 to 10.128.140.6 on 2017.09.14 10:39
[MAC of TAP-ADAPTER here]changed IP Address 10.128.140.6 to 10.128.140.15 on 2017.09.14 10:39
[MAC of TAP-ADAPTER here]changed IP Address 10.128.140.15 to 10.128.140.7 on 2017.09.14 10:39
[MAC of TAP-ADAPTER here]changed IP Address 10.128.140.7 to 10.128.140.8 on 2017.09.14 10:39
And so on... It just keeps on going like this.. Seems to never end switching IP-address.
Sometimes it seems like it switch ip 10 times every second, sometimes about 1 time every second and sometimes less frequent.
In this session I tested, that was about 8 minutes and 30 sec long, it changed ip over 750 times.
comment:28 Changed 7 years ago by
Changed my Ethernet card from Intel to a Realtek.
It did not solve the problem. Lagspikes and total clogging of the internet happens just the same.
It have been problems for many years now, will this ever be solved?
comment:29 Changed 7 years ago by
Problems can be replicated sometimes by connecting to IP-addresses to transfer data it seems.
Every time I'm connecting to some IPs to download blockchain for bitcoin, connection dies.
I tested about 50 different torrents, starting some of them kills the network when it's connecting to the peer IP's.
Even if limit is set to connect up to low amount of maximum 15 connections, network dies.
It seems TAP adapter somehow lose connection to VPN server and start to change IP, and it just keeps doing that for a short or very very long amount of time.
And therefor network is down meanwhile.
And as soon as it gets back online trying to access that torrent or whatever triggered it to go bananas, it starts to do it again.
But it does not fully kill the connection to server. As VPN program show connection as Online but using the VPN tunnel is impossible.
comment:30 Changed 7 years ago by
I have done a full packet capture, over 200 seconds of time. From before I start VPN software, startup, connection, start some random torrent traffic, and then the death of VPN traffic in tunnel, and about 40 sec extra while VPN tunnel is "dead" but VPN server is still connected.
I used a network monitor to write down almost exact time when traffic went from 15Mbit/s to 0Mbit/s instantly and then stay at 0Mbit/s.
I see there is thousands of these in wireshark:
157639 154.958686 VPN-serverIP MyIP IPv4 1514 Fragmented IP protocol (proto=UDP 17, off=0, ID=a985) [Reassembled in #157640]
157640 154.958687 VPN-serverIP MyIP OpenVPN 60 MessageType?: P_DATA_V1
Packet line 1:
Number=157639
Time=154.958686
From server IP to my IP
Protocol: IPv4
Length (bytes)= 1514
Info= Fragmented IP protocol (proto=UDP 17, off=0, ID=9ac3)[Reassembled in #149725]
Packet line 2:
Number=157640
Time=154.958687
From server IP to my IP
Protocol: OpenVPN
Length (bytes)= 60
Info= MessageType?: P_DATA_V1
Sometimes the length of the OpenVPN packates after are 60, then another one either from 60-200 length or 1461 length.
And after that another one of those Fragmented IP protocol packets comes.
This happens thousands and thousands of times all over the time traffic goes through tunnel.
It comes in very high bursts sometimes and sometimes sporadic.
It happens all the time, but at the point where VPN tunnel dies it seems they are just enough frequent to kill the connection totally.
I believe this is the sign of what ever cause the lag, stutter and the latency problems.
In the list in wireshark these show up as white lines, if you look at all traffic these white lines is like 30-50% of all packets like a zebra pattern over the whole time from when I start to send data through VPN until it dies.
For some seconds(ecspecially at time when tunnel dies) only this pattern of packets listed above is repeated about 8000 times every second.
There are some pockets of OpenVPN packets that look normal but then bursts of this Fragmented IP Protocol packets come again.
After tunnel died, these packets show up, maybe 20-30 times over the next 2-5 seconds, and after that they don't come any more and it's only OpenVPN packets. And traffic in tunnel is not going through.
Is this helpful?
Samuli could you perhaps reply here if you need this packet capture in full to see what happened to track this bug?
It's 200 sec long, 107MB large.
I have written down time in capture log where the tunnel dies.
comment:32 Changed 6 years ago by
for me, the situation was solved with "tcp-nodelay" server option.
comment:33 Changed 5 years ago by
Hi there,
For me even tcp_nodelay did not fix anything...
I tried everything in this post and oter found on the web....
Think it's still very buggy. Will force me to upgrade to Win10....
comment:34 Changed 5 years ago by
Cc: | tct added |
---|
comment:35 Changed 5 years ago by
Windows 7 is at its end of life.
https://support.microsoft.com/en-us/help/4057281/windows-7-support-will-end-on-january-14-2020
Seems like an MTU issue. What network connection type do you have?