Opened 6 years ago

Closed 22 months ago

#1051 closed Bug / Defect (fixed)

Openvpn interactive service issue

Reported by: it@… Owned by:
Priority: major Milestone:
Component: Windows GUI Version: OpenVPN 2.4.5 (Community Ed)
Severity: Not set (select this one, unless your'e a OpenVPN developer) Keywords: interactive service management interface connection problem
Cc: it@…

Description

Symptom of the problem:
When openvpn-gui initiates an openvpn connection through openvpn interactive service sometimes openvpn-gui fails to connect to the openvpn management interface due to synchronization problem.

Root cause of the problem:
When openvpn-gui initiates an openvpn connection through openvpn interactive service, the service starts the openvpn.exe, but doesn't wait for the openvpn.exe to start listening on management port before notifies openvpn-gui to connect to the openvpn management interface. (nor openvpn-gui waits for the openvpn.exe to start listening on the management port)

How to reproduce the problem:
Prerequisites:
The client should be connected to a windows domain and the RCP servers of the domain have to be resolvable by DNS
When the openvpn interactive service starts openvpn.exe, (probably because of the impersonation request) windows starts to initiate RPC session with RPC servers of the domain. These RPC sessions can take several seconds before openvpn.exe actually starts and starts listening on management interface. Openvpn-gui tries to connect to the management interface of the openvpn.exe before the openvpn.exe started, so openvpn-gui fails with the error: "Connecting to management interface failed."

Resolution:
Resolution Nr. 1.: If it is possible openvpn interactive service should wait for openvpn.exe to request for password and/or to start listening on management port before let openvpn-gui starting to connect to openvpn management interface.

Resolution Nr. 2.: Openvpn-gui should keep trying to connect to the management interface for a redefined period of time like 20 seconds before gives it up and returns with connection failure error: "Connecting to management interface failed.".

Additional note:
When openvpn-gui fails nobody takes care of cleaning up the running openvpn.exe. Therefore next time when the connection started from openvpn-gui openvpn interactive service tries to start openvpn.exe with the same configuration that is already used by a running instance, so it fails. openvpn interactive service should monitor if it has already started an instance for the configuration and it still runs.

Change History (16)

comment:1 Changed 6 years ago by Gert Döring

Cc: Selva Nair added
Owner: Heiko Hund deleted
Status: newassigned

comment:2 Changed 6 years ago by Selva Nair

Cc: Selva Nair removed

Actually the GUI does not wait for the service to launch openvpn.exe, but just sends the request to do so and starts to try connecting to the management i/f until a timeout (15 seconds) . This works fine as long as the management comes up within that time.

The service does need to impersonate and then check the token for membership in some groups. I thought neither would require an RPC connection as the token can be locally duplicated from that of the client without connection to the DC. In any case, if openvpn.exe takes more than 15 seconds to come up, the GUI will fail to connect to it and lead to the situation you describe. OpenVPN starts listening on the management soon after it starts so there should be no long delay involved there.

I suggest (i) start the polling for management only after receiving the process ID of openvpn.exe from the service and (ii) kill the process if the management connection failed.

(i) will greatly reduce the time the GUI will have to wait for the management, but then we'll need to decide how long to wait for the service to start the process. Would 60 seconds be good enough?

comment:3 Changed 6 years ago by it@…

I completely agree with your suggestions. Maybe the default 60 secs that would be a configurable timeout in the openvpn-gui would be the best solution.

The timeout problem occurs with a USB token using cryptoapicert and subject match, but maybe it is irrelevant. It seems that way how openvpn interactive service starts the openvpn.exe causes the problem, because when the openvpn.exe runs directly with the --config option on the same .ovpn configfile, it starts immediately, there is no delay, no RPC calls.

comment:4 Changed 6 years ago by Selva Nair

@it@… Could you please test the exe included here https://github.com/selvanair/openvpn-gui/releases ?

I opted for a simpler approach: do not abort on timeout but keep retrying unless the process or service itself triggers a startup error. Or the user can abort by pressing disconnect.

Things to test: (i) situations that require a long timeout works (ii) an early abort will not leave openvpn.exe running.

comment:5 Changed 6 years ago by it@…

Thank you for your efforts. It seems your changes are work. However disconnect button does not work if management interface is in wait state, it is waiting for the token pin code for example. By the way could you please check your code regarding function of "CreateProcessAsUser?". We found that calling of this function returns slowly in domain environment. Regards, Zoltan Danhauser

comment:6 Changed 6 years ago by Selva Nair

Disconnect button should work even when token insert request or PIN prompt is pending. Though in such cases the UX may not be ideal because of the way openvpn.exe treats a failure in pkcs11 callbacks. Essentially, when the user presses disconnect, the exit event triggers but openvpn will do a SIGHUP restart without checking the exit event. The restart will fail as exit event has triggered and the GUI will show a reconnection failed message instead of a clean user-initiated exit.

The end effect is to stop openvpn.exe and subsequent connection should work. If this is not happening please post more details and how to reproduce.

Regarding CreateProcessAsUser?, I suppose you are referring to the function RunOpenVPN() as a whole, not just that particular call. I can think of only two calls that could potentially cause a slowdown: one to LookupAccountSID() and one to LookupAccountName?(). Both will use cached info if DC is not reachable but may make network access otherwise.

One of these calls could be eliminated (only used for logging the username) but not the second one which looks up the SID of a group named "OpenVPN Administrators". Could you check whether creating a group with that name locally speeds up the startup?

comment:7 Changed 6 years ago by rf

Just to add another test report, the retry logic in https://github.com/selvanair/openvpn-gui/releases/download/mgmt-timeout/openvpn-gui-timeout.exe solves also an issue where openvpn.exe waited for the respective management interface commands from the GUI as seen in the following logs. (Those are the last lines logged by OpenVPN GUI):

Tue Jun 19 10:19:51 2018 us=928730 MANAGEMENT: TCP Socket listening on [AF_INET]127.0.0.1:25340
Tue Jun 19 10:19:51 2018 us=928730 Need hold release from management interface, waiting...
Tue Jun 19 10:21:16 2018 us=957391 MANAGEMENT: Client connected from [AF_INET]127.0.0.1:25340 

With Selvas patched OpenVPN GUI there are some retries and the VPN connection gets established eventually, with the stock OpenVPN GUI as shipped with OpenVPN 2.4.6, the OpenVPN GUI hangs with the above mentioned logs and the VPN connection never gets established.

comment:8 Changed 6 years ago by Selva Nair

Could you please check whether adding a local group named "OpenVPN Administrators" speeds up the process? Looking for this group and the username are the only two actions by the service that could query the DC (if accessible) and slow things down a bit. Even then a 2 minute delay looks unreasonable.

If its the username lookup that's slow we can eliminate it as its just cosmetic.

comment:9 in reply to:  8 Changed 6 years ago by rf

Hi Selva,

first, thank you very much for your timeout patch!

Could you please check whether adding a local group named "OpenVPN Administrators" speeds up the process?

Unfortunately creating the local group doesn't seem to speed up the process. I couldn't debug the issue any further (eg. checking if unreachable RPC targets would trigger the issue) since currently I no longer have access to the machine.

comment:10 Changed 4 years ago by YAP

I'm unsure if this is necro but my current issue seems very similar indeed.

I recently updated Windows 10 to v2004 and this is my log:

Thu Jul 02 12:31:08 2020 OpenVPN 2.4.9 x86_64-w64-mingw32 [SSL (OpenSSL)] [LZO] [LZ4] [PKCS11] [AEAD] built on Apr 16 2020
Thu Jul 02 12:31:08 2020 Windows version 6.2 (Windows 8 or greater) 64bit
Thu Jul 02 12:31:08 2020 library versions: OpenSSL 1.1.1f 31 Mar 2020, LZO 2.10
Enter Management Password:
Thu Jul 02 12:31:08 2020 MANAGEMENT: Socket bind failed on local address [AF_INET]127.0.0.1:25340
Thu Jul 02 12:31:08 2020 Exiting due to fatal error

I made an attempt using the .exe linked above with no different behaviour, it fails instantly, no delay/wait at all.
I double checked: nothing is listening at the port number openvpn attempts to use and command line works.

comment:11 in reply to:  10 Changed 4 years ago by Selva Nair

Replying to YAP:

Thu Jul 02 12:31:08 2020 MANAGEMENT: Socket bind failed on local address [AF_INET]127.0.0.1:25340

This error implies port 25340 is in use. Unfortunately, we don't have a way of specifying a different port from the GUI user interface. Check whether another copy of openvpn.exe and possibly openvpn-gui.exe are running (open taskmanager and check) --- could happen if another user is logged in (as switched user). If so, you will have to stop/kill those running processes and try again.

If that's not the case, from an elevated command prompt run "netstat -tn | findstr 2534" to see what is using port 25340 and whether it could be moved to another port.

Note to self: we need to have an option to change the default port used by the GUI to talk to the core -- hopefully in the next version.

I'm unsure if this is necro but my current issue seems very similar indeed.

Totally unrelated. The issue discussed in this thread is caused by slow RPC connection to the domain controller from the interactive service and happens even before openvpn.exe is started.

comment:12 Changed 4 years ago by Gert Döring

So, going through open and "smelling somewhat stale" tickets today - are the improvements already included in the gui? Is there a gui PR I need to review? Can we close the ticket?

thanks :)

comment:13 Changed 4 years ago by Selva Nair

Two issues in this ticket: the first one caused by slow connection to DC has been improved in many ways and probably is no longer an issue. The second one about management port is unresolved.

The management port is still hard-coded (25340+config id) in the GUI with no option to change it. This also causes conflict when two users use the GUI in parallel (rare case, but happens).

Good timing. I was reminded of this just yesterday for some reason, and am toying between two options: (i) use port = 0 and let OpenVPN pick a free one but not sure how to get it back to the GUI -- via stdout through the service? (ii) Make the offset (25340) configurable in the GUI but that's not as transparent to the user nor very user-friendly.

comment:14 Changed 4 years ago by Gert Döring

How complicated would it be to turn around the connection sequence, aka, the GUI finds a free port, listens on it, and openvpn then connects to the GUI port, using --management-client?

Maybe this is a bit too much change for 2.5 at this point, but afterwards?

comment:15 Changed 22 months ago by Selva Nair

We currently start openvpn.exe from the main thread but handle all socket events in a connection-specific thread. Listening before starting OpenVPN may require some non-trivial restructuring.

What about passing management port = 0 to the core and getting the actual port back via the service pipe?

comment:16 Changed 22 months ago by Selva Nair

Resolution: fixed
Status: assignedclosed

Resolved by PR509 which uses a simpler approach of choosing a free ephemeral port if the preset port is busy. This approach works well on Windows.

Note: See TracTickets for help on using tickets.