Opened 4 months ago

Last modified 9 days ago

#1388 new Bug / Defect

Bridge: One connection once error

Reported by: ToddAndMargo Owned by:
Priority: major Milestone: release 2.5.3
Component: Generic / unclassified Version: OpenVPN 2.5.0 (Community Ed)
Severity: Not set (select this one, unless your'e a OpenVPN developer) Keywords: one conection once
Cc: tincantech, stipa

Description (last modified by tincantech)

Hi All,

Server: five computers running
Windows 10 Pro x64-20H2
OpenVPN-2.5.1.exe
tap-bridge and Ethernet are bridged

Client: one instance of
Fedora 33, x64
openvpn-2.4.10-1.fc33.x86_64

Editorial comment: AAAAA HHHHHHH !!!!!!

All my previously working tunnels got clobbered by a Windows 10 updated. Which one, I do not know.

Symptoms: if I run through the troubleshooting steps that I will post at the bottom of this, I can connect ONCE. Everything works fine until I press "reconnect" or “disconnect” on the server's OpenVPN GUI. Then OpenVPN seizes up. If I reconnect from the clients side, the same symptom occur.

Rebooting to un-seize and then will connect again. Everything is normal on the clientsifconfig tap0 and the server's GUI, except ZERO traffic will pass. Try again and now it will not connect at all. "reconnect" or "disconnect" on the Server's GUI or the client and we are seized up again. Restarting "OpenVPN Interactive Service" does nothing.

Did I mention “AAAAA HHHHHHH !!!!!!” ???

My config:

Server config
float
port xxxx
proto udp4
dev tap
dev-node tap-bridge
ca ca.crt
cert xxx-server.crt
key xxx-server.key
dh dh.pem
ifconfig-pool-persist ipp.txt
server-bridge 192.168.200.20 255.255.255.0 192.168.200.50 192.168.200.60
client-to-client
keepalive 10 120
comp-lzo
persist-key
persist-tun
status openvpn-status.log
verb 3
--tun-mtu 1500
--fragment 1300
--mssfix

Client config
remote 66.214.96.122
port 5030
client
dev tap
proto udp
resolv-retry infinite
nobind
persist-key
persist-tun
ca ca.crt
cert xxx-client.crt
key xxx-client.key
ns-cert-type server
ping 10
comp-lzo
verb 3
-tun-mtu 1500
--fragment 1300
--mssfix

(One of the five) server’s log file:

Code: Select all

2021-02-01 21:17:54 WARNING: Compression for receiving enabled. Compression has been used in the past to break encryption. Sent packets are not compressed unless "allow-compression yes" is also set.
2021-02-01 21:17:54 --cipher is not set. Previous OpenVPN version defaulted to BF-CBC as fallback when cipher negotiation failed in this case. If you need this fallback please add '--data-ciphers-fallback BF-CBC' to your configuration and/or add BF-CBC to --data-ciphers.

2021-02-01 21:17:54 --pull-filter ignored for --mode server
2021-02-01 21:17:54 OpenVPN 2.5.0 x86_64-w64-mingw32 [SSL (OpenSSL)] [LZO] [LZ4] [PKCS11] [AEAD] built on Oct 28 2020
2021-02-01 21:17:54 Windows version 10.0 (Windows 10 or greater) 64bit
2021-02-01 21:17:54 library versions: OpenSSL 1.1.1h 22 Sep 2020, LZO 2.10

Enter Management Password:

2021-02-01 21:17:54 MANAGEMENT: TCP Socket listening on [AF_INET]127.0.0.1:25340
2021-02-01 21:17:54 Need hold release from management interface, waiting...
2021-02-01 21:17:55 MANAGEMENT: Client connected from [AF_INET]127.0.0.1:25340
2021-02-01 21:17:55 MANAGEMENT: CMD 'state on'
2021-02-01 21:17:55 MANAGEMENT: CMD 'log all on'
2021-02-01 21:17:55 MANAGEMENT: CMD 'echo all on'
2021-02-01 21:17:55 MANAGEMENT: CMD 'bytecount 5'
2021-02-01 21:17:55 MANAGEMENT: CMD 'hold off'
2021-02-01 21:17:55 MANAGEMENT: CMD 'hold release'
2021-02-01 21:17:55 NOTE: when bridging your LAN adapter with the TAP adapter, note that the new bridge adapter will often take on its own IP address that is different from what the LAN adapter was previously set to
2021-02-01 21:17:55 Note: cannot open openvpn-status.log for WRITE
2021-02-01 21:17:55 Note: cannot open ipp.txt for READ/WRITE
2021-02-01 21:17:55 Diffie-Hellman initialized with 2048 bit key
2021-02-01 21:17:55 interactive service msg_channel=560
2021-02-01 21:17:55 open_tun
2021-02-01 21:17:55 tap-windows6 device [tap-bridge] opened
2021-02-01 21:17:55 TAP-Windows Driver Version 9.24
2021-02-01 21:17:55 Sleeping for 10 seconds...
2021-02-01 21:18:05 NOTE: FlushIpNetTable? failed on interface [8] {354BCAC2-1264-4D33-9FAD-E6A2C9992E94} (status=1168) : Element not found.
2021-02-01 21:18:05 MANAGEMENT: >STATE:1612243085,ASSIGN_IP
2021-02-01 21:18:05 Socket Buffers: R=[65536->65536] S=[65536->65536]
2021-02-01 21:18:05 UDPv4 link local (bound): [AF_INET][undef]:5030
2021-02-01 21:18:05 UDPv4 link remote: [AF_UNSPEC]
2021-02-01 21:18:05 MULTI: multi_init called, r=256 v=256
2021-02-01 21:18:05 IFCONFIG POOL IPv4: base=192.168.200.50 size=11
2021-02-01 21:18:05 IFCONFIG POOL LIST
2021-02-01 21:18:05 Initialization Sequence Completed
2021-02-01 21:18:05 MANAGEMENT: >STATE:1612243085,CONNECTED,SUCCESS
,
2021-02-01 21:18:06 50.37.25.185:59750 TLS: Initial packet from [AF_INET]50.37.25.185:59750, sid=d1dbf774 963f2d20
2021-02-01 21:18:06 50.37.25.185:59750 VERIFY OK: depth=0, CN=GSA-client
2021-02-01 21:18:06 50.37.25.185:59750 peer info: IV_VER=2.4.10
2021-02-01 21:18:06 50.37.25.185:59750 peer info: IV_PLAT=linux
2021-02-01 21:18:06 50.37.25.185:59750 peer info: IV_PROTO=2
2021-02-01 21:18:06 50.37.25.185:59750 peer info: IV_NCP=2
2021-02-01 21:18:06 50.37.25.185:59750 peer info: IV_CIPHERS=AES-256-GCM:AES-128-GCM:BF-CBC
2021-02-01 21:18:06 50.37.25.185:59750 peer info: IV_LZ4=1
2021-02-01 21:18:06 50.37.25.185:59750 peer info: IV_LZ4v2=1
2021-02-01 21:18:06 50.37.25.185:59750 peer info: IV_LZO=1
2021-02-01 21:18:06 50.37.25.185:59750 peer info: IV_COMP_STUB=1
2021-02-01 21:18:06 50.37.25.185:59750 peer info: IV_COMP_STUBv2=1
2021-02-01 21:18:06 50.37.25.185:59750 peer info: IV_TCPNL=1
2021-02-01 21:18:06 50.37.25.185:59750 Control Channel: TLSv1.3, cipher TLSv1.3 TLS_AES_256_GCM_SHA384, 2048 bit RSA
2021-02-01 21:18:06 50.37.25.185:59750 [GSA-client] Peer Connection Initiated with [AF_INET]50.37.25.185:59750
2021-02-01 21:18:06 GSA-client/50.37.25.185:59750 MULTI_sva: pool returned IPv4=192.168.200.50, IPv6=(Not enabled)
2021-02-01 21:18:06 GSA-client/50.37.25.185:59750 Data Channel: using negotiated cipher 'AES-256-GCM'
2021-02-01 21:18:06 Xxx-client/50.37.25.185:59750 Outgoing Data Channel: Cipher 'AES-256-GCM' initialized with 256 bit key
2021-02-01 21:18:06 Xxx-client/50.37.25.185:59750 Incoming Data Channel: Cipher 'AES-256-GCM' initialized with 256 bit key
2021-02-01 21:18:07 Xxx-client/50.37.25.185:59750 PUSH: Received control message: 'PUSH_REQUEST'
2021-02-01 21:18:07 Xxx-client/50.37.25.185:59750 SENT CONTROL [Xxx-client]: 'PUSH_REPLY,route-gateway 192.168.200.20,ping 10,ping-restart 120,ifconfig 192.168.200.50 255.255.255.0,peer-id 0,cipher AES-256-GCM' (status=1)
2021-02-01 21:18:07 Xxx-client/50.37.25.185:59750 MULTI: Learn: 7e:f8:c5:d9:f6:48@0 -> Xxx-client/50.37.25.185:59750
2021-02-01 21:22:07 Xxx-client/50.37.25.185:59750 [Xxx-client] Inactivity timeout (--ping-restart), restarting
2021-02-01 21:22:07 Xxx-client/50.37.25.185:59750 SIGUSR1[soft,ping-restart] received, client-instance restarting

Troubleshooting steps:

Open VPN will not work in Windows 10: Tunnel initializes, but traffic won't flow

A working tunnel will be able to ping the firewall at the server's
end (not the client, due the the anti virus's firewall) and
the server side should be able to ping the client's new TAP IP address

Things tried:

Reinstall OpenVPN:

Run AutoRuns? as ADMINISTRATOR:
Remove all bad start points (the yellow ones), especially OpenVPN services

Delete the bridge (ncpa.cpl)
WARNING: if you neglect this step or do it out of sequence, outbound traffic
will seize up after removing OpenVPN

Remove Openvpn (appwiz.cpl) and reboot

Re-run AutoRuns? to get rid of anything that did not remove and reboot

Make sure you have the latest OpenVPN

Reinstall OpenVPN AS ADMINISTRATOR (if possible), reboot

Redo the bridge, see below

flush DNS and reset Winsock:
Admin cmd

Code: Select all

ipconfig /flushdns
ipconfig /registerdns
ipconfig /release
ipconfig /renew
NETSH winsock reset catalog
NETSH int ipv4 reset reset.log
NETSH int ipv6 reset reset.log


reboot

reset the TAP adapter:

--> reboot to clear and jammed tunnels
--> Device Manager (devmgmt.msc)
--> Network adapters
--> uninsall any TAP drivers
--> Action menu

--> Start
--> all programs
--> Open VPN
--> Utilities
--> Add a new TAP-Windows6 virtual network adapter
--> name it tap-bridge and relink the bridge (ncpa.cpl)
from the bridge's adapters properties

Change History (21)

comment:1 Changed 4 months ago by tincantech

Cc: tincantech added
Last edited 4 months ago by tincantech (previous) (diff)

comment:2 Changed 4 months ago by Gert Döring

Cc: stipa added

Don't do bridging on Windows. Nobody has been able to get this to work reliably, and nobody really understands why it does not work - and if it does not work, it's very hard to debug remotely why it is failing this time.

If you must do bridging, get something Linux or *BSD to do this.

But much better to do routed setups with proper DNS - Windows networking will work just as well, it has less overhead (= faster) and is easier to troubleshoot.

comment:3 Changed 4 months ago by ToddAndMargo

If bridge does not work, how can I test for rogue devices on the other end of the tunnel?

comment:4 in reply to:  3 Changed 4 months ago by Gert Döring

Replying to ToddAndMargo:

If bridge does not work, how can I test for rogue devices on the other end of the tunnel?

How do you do that with bridging today?

I'm not aware of any solutions in that space, so genuinely interested.

comment:5 Changed 4 months ago by ToddAndMargo

Hi Gert,

What I am mainly looking for is second routers installed on the LAN to give access to wireless devices which defeat the segmentation requirements of PCI (payment Card industry) for networking a Point of Sale system. (And I always give them
a segmented WAP, so there is no excuse for this.)

And folks who replace their router and forget they have to remove the old one and the two competing DHCP servers with two separate networks on the same Local Area Network (LAN) mess them up royally.

I use to be able to do this with Autoscan-Network for Windows but Autoscan stopped working on Windows and the project is non-responsive. Then I did it with an OpenVPN tunnel and scanned over the tunnel with Autoscan-Network which still works on Linux.
Then Windows 10 killed OpenVPN's ability to operate in Bridge mode (this bug report), so I am back to trying to find a substitute for Autoscan-Network for Windows.

If I am on-site, I can boot off my full Fedora install of Linux on a USB stick and run Autoscan that way, but I am usually remotely logged in with Go To Assist.

I also can not figure out how Autoscan does its thing and does it so rapidly. Nothing escapes it.

And since there is no more Bridge Mode in Open VPN for Windows, poop!

-T

comment:6 Changed 4 months ago by Gert Döring

I am getting slightly more confused now, the "server" vs. "client" part here confuses me.

So there is 5 remote LAN instances, that do "something", and you use the OpenVPN instance on each of these servers to bridge yourself onto the remote LAN, to be able to scan it for "devices that should not be there"?

This is indeed something where routing might not give you all you need (in many cases, it might still work, if you do source NAT on the to-LAN leg, but depending on what the scan tool expects (ARP?), might not see all devices).

And since you are not controlling the routers, running OpenWRT there with OpenVPN is also not possible...

Yeah, hard problem.

As for the original problem report: we have various reports about bridging on windows not working for some, partially working for others, working fine and eventually blue screening for others again. So "this platform is problematic". And with Microsoft changing stuff every half year, it's very hard to keep up.

This said - for a somewhat more systematic approach on "why are packets not flowing", I'd suggest to wireshark in parallel on the client's tap and the server tap, and see if it's openvpn (if you see a difference there, openvpn ate packets), and on server tap vs. server lan (if you see a difference there, windows ate packets).

What you also can try is to dumb down openvpn on the server side - instead of running in --server-bridge mode (which will add a bridge after the bridge), run it in p2p mode. So the "server" would have no ---server* in its config, just mode tls-server while the client has mode tls-client.

If you do that, openvpn turns into a very dumb packet pipe, without any local decision making "should I forward that packet or not?". It will only handle a single "client", but this is what you are using. It will not push ifconfig commands from the server to the client, so you need to add ifconfig 192.168... to your client config to set up a tap IP address.

Now, I'm not sure if that works, but it simplifies things a bit, taking OpenVPN's MAC learning logic out of the equation. Wireshark should tell you where packets get stuck, though.

Last edited 4 months ago by Gert Döring (previous) (diff)

comment:7 Changed 4 months ago by Gert Döring

Or, totally crazy idea, enable HyperV on the windows server, run a Fedora install in there, and do the scans locally :-)

(But I do want OpenVPN to work...)

comment:8 Changed 4 months ago by ToddAndMargo

Hi Gert,

I am getting slightly more confused now, the "server" vs. "client" part

here confuses me.

I installed the COPR repo and am now on 2.5.1 on the Fedora client side. No symptom change.

I am the OpenVPN client. My customers remotes sites (now 7 of them) are running the OpenVPN server. I only have one tunnel open at a time.

I have put a trace on my the client's Watchguard firewall. My packets are arriving and being properly forwarded to the Open VPN servers. I can even do a UDP traceroute from my side into the customer's server. NMap shows the server's UDP port open. Things are dying in Windows 10.

Before this occurred, I ran Autoscan-Network over the pipe and it found EVERYTHING on the other end. My end too.

Never thought of running a virtual machine on the customer's computer. I could always use Fedora as the host system and run windows in qemu-KVM. But not on these computers as they are meant to be sparse.

Would mode tls* allow me to probe for other devices on the LAN on the other side of the tunnel?

I sincerely wish folks would not run Windows on PCI and HIPAA workstations. How silly when security is such an issue on these machines. But I don't get a vote.

Last edited 4 months ago by ToddAndMargo (previous) (diff)

comment:9 Changed 4 months ago by tincantech

We have recently had reports that Openvpn on Win10 does not work even in standard TLS-Server/Client mode. The fix for that appears to be to use --dhcp-renew to force DHCP renewal.

That will not effect a bridged server because it uses a fixed IP but it does indicate that MS have changed things which can/do effect Openvpn.

However, the most common mistake for people setting up a bridge is when they use:

  • --server-bridge {local-ip} netmask pool-start-IP pool-end-IP

instead of the correct syntax of :

  • --server-bridge {gateway-ip} netmask pool-start-IP pool-end-IP

Your config has:

  • server-bridge 192.168.200.20 255.255.255.0 192.168.200.50 192.168.200.60

and 192.168.200.20 looks distinctly like the Server IP not the gateway IP. Please verify and change for at least testing purposes.

I don't have up to date Win10 to test with but on Win7 I can confirm that a bridge setup works fine for me.

Last edited 4 months ago by tincantech (previous) (diff)

comment:10 in reply to:  9 Changed 4 months ago by ToddAndMargo

Replying to tincantech:

Your config has:

  • server-bridge 192.168.200.20 255.255.255.0 192.168.200.50 192.168.200.60

and 192.168.200.20 looks distinctly like the Server IP not the gateway IP. Please verify and change for at least testing purposes.

Forgive the stupid question here, but in this context, what do you mean by the "gateway IP"?

The server's bridge fixed IP is 192.168.200.100

The "tap-bridge" adapter is 192.168.200.20 before creating the bridge

The "expected" IP assisted to the client by the server is 192.168.200.50

I don't have up to date Win10 to test with but on Win7 I can confirm that a bridge setup works fine for me.

Open VPN stopped working for me as of 20H2.

The quality issues with W10 take my breath away at time. I spent four hours troubleshooting a corrupted TCP configuration on a fresh blank install of 20H2. Unbelievable, if I had not lived through it.

comment:11 Changed 4 months ago by tincantech

See ipconfig /all

comment:12 Changed 4 months ago by ToddAndMargo

That does not help. ipconfig /all shows the firewall's address, which is also the gateway to the Internet of the server. I am not trying to get out the Internet on the server from my client. I want to look at the server's LAN.

# Configure server mode for ethernet bridging.
# You must first use your OS's bridging capability
# to bridge the TAP interface with the ethernet
# NIC interface. Then you must manually set the
# IP/netmask on the bridge interface, here we
# assume 10.8.0.4/255.255.255.0. Finally we
# must set aside an IP range in this subnet
# (start=10.8.0.50 end=10.8.0.100) to allocate
# to connecting clients. Leave this line commented
# out unless you are ethernet bridging.
;server-bridge 10.8.0.4 255.255.255.0 10.8.0.50 10.8.0.100
;server-bridge 192.168.220.20 255.255.255.0 192.168.220.50 192.168.220.90
server-bridge 192.168.200.20 255.255.255.0 192.168.200.50 192.168.200.60

I am following this instruction: "Then you must manually set the IP/netmask on the bridge Interface" which I set to 192.168.200.20 255.255.255.0. Has nothing to do with the gateway to the server's firewall/router.

For testing purposes, I can set it to whatever you want. Do you have something you'd like me to try?

comment:13 Changed 3 months ago by eddy21

Hi, please read what I wrote here:
https://redmine.pfsense.org/issues/11575

And let me know if anything makes sense for you, or if you can fix it the same way.

comment:14 Changed 3 months ago by ToddAndMargo

Hi Eddy,

I am overwhelmed by the link. Can you simplify what test you want me to run?

-T

comment:15 in reply to:  14 Changed 3 months ago by eddy21

Replying to ToddAndMargo:

Hi Eddy,

I am overwhelmed by the link. Can you simplify what test you want me to run?

-T

Please provide the output of "openvpn --version" - server side.
I need to know your server's version and if enable_async_push flag is yes/no

comment:16 Changed 3 months ago by ToddAndMargo

C:\Users\GSA>"C:\Program Files\OpenVPN\bin\openvpn" --version
OpenVPN 2.5.1 x86_64-w64-mingw32 [SSL (OpenSSL)] [LZO] [LZ4] [PKCS11] [AEAD] built on Feb 24 2021
library versions: OpenSSL 1.1.1j 16 Feb 2021, LZO 2.10
Windows version 10.0 (Windows 10 or greater) 64bit
Originally developed by James Yonan
Copyright (C) 2002-2018 OpenVPN Inc <sales@…>
Compile time defines: enable_async_push=no enable_comp_stub=no enable_crypto_ofb_cfb=yes enable_debug=yes enable_def_auth=yes enable_dlopen=unknown enable_dlopen_self=unknown enable_dlopen_self_static=unknown enable_fast_install=needless enable_fragment=yes enable_iproute2=no enable_libtool_lock=yes enable_lz4=yes enable_lzo=yes enable_management=yes enable_multihome=yes enable_pam_dlopen=no enable_pedantic=no enable_pf=yes enable_pkcs11=yes enable_plugin_auth_pam=no enable_plugin_down_root=no enable_plugins=yes enable_port_share=yes enable_selinux=no enable_shared=yes enable_shared_with_static_runtimes=yes enable_small=no enable_static=yes enable_strict=no enable_strict_options=no enable_systemd=no enable_werror=no enable_win32_dll=yes enable_x509_alt_username=no with_aix_soname=aix with_crypto_library=openssl with_gnu_ld=yes with_mem_check=no with_special_build= with_sysroot=no

comment:17 Changed 3 months ago by Gert Döring

The enable-async-push thing is very unlikely to have anything to do with the issue of "bridging does not work on windows" and "openvpn locks up".

comment:18 Changed 3 months ago by ToddAndMargo

Bridging on Windows 10 does work, but it only works once. To get it to work again, you have to remove OpenVPN, reboot, and reinstall OpenVPN.

Last edited 3 months ago by ToddAndMargo (previous) (diff)

comment:19 in reply to:  18 Changed 3 months ago by eddy21

Replying to ToddAndMargo:

Bridging on Windows 10 does work, but it only works once. To get it to work again, you have to remove OpenVPN, reboot, and reinstall OpenVPN.

Thanks for this clarification, I misunderstood you. I thought that, as happened to me, connection works ok until a reconnection needs to be done. In my case I am not bridging and it works after a while or forcing a complete renegotiation of keys.

comment:20 Changed 3 months ago by Gert Döring

Milestone: release 2.5.1release 2.5.3

comment:21 Changed 9 days ago by tincantech

Description: modified (diff)

Removed irrelevant external link (a.k.a. spam)

Note: See TracTickets for help on using tickets.