Opened 11 years ago

Closed 10 years ago

#238 closed Bug / Defect (wontfix)

Client reports after 60 seconds "TLS Error: TLS handshake failed" - while being successfully connected.

Reported by: DanielZuck Owned by:
Priority: trivial Milestone:
Component: Generic / unclassified Version: OpenVPN 2.3-beta / 2.3-RC (Community Ed)
Severity: Not set (select this one, unless your'e a OpenVPN developer) Keywords: Session Error TLS
Cc:

Description

In a nutshell, the connected client reports after 60 seconds of being connected (and *while payload traffic is flowing either way thru the tunnel*) a TLS error: "Sun Nov 11 14:01:47 2012 us=875753 TLS Error: TLS key negotiation failed to occur within 60 seconds (check your network connectivity)". There is no error reported on the server side.

The version on either side is 2.3_rc1, the client on Linux, the server is FreeBSD.

A functional workaround is to comment out "nobind" at the client, which causes two trade offs:

  • The client reports to the calling init script a startup error; however it is starting (so this is a cosmetic issue).
  • It looks like client fails to connect, but then the server is calling back due to this attempt of the client; see the log from the client:

Sun Nov 11 14:02:19 2012 us=108019 Socket Buffers: R=[229376->131072] S=[229376->131072]
Sun Nov 11 14:02:19 2012 us=108077 TCP/UDP: Socket bind failed on local address [undef]: Address already in use
Sun Nov 11 14:02:19 2012 us=108145 Exiting due to fatal error
Sun Nov 11 14:02:19 2012 us=113721 TLS: Initial packet from [AF_INET]46.165.212.205:1414, sid=8919891a b5f043ca

Frankly related to this TLS error, there is a FAQ pointing to network issues. However, I want to exclude this for at least three reasons:

  • There is another box running over this setup stable since years, however with older versions of OpenVPN. And yes: To avoid potential influence and crosstalk, it had been turned off, to be sure.
  • As mentioned: (Payload)Traffic is flowing either way thru the established tunnel. Apart from this, the above timeout is reported.
  • Both sides are running mostly idle while reproducing this issue (however it does not change with load).

OpenVPN on the server side is:
OpenVPN 2.3_rc1 amd64-portbld-freebsd9.1 [SSL (OpenSSL)] [LZO] [eurephia] [MH] [IPv6] built on Nov 11 2012
Originally developed by James Yonan
Copyright (C) 2002-2010 OpenVPN Technologies, Inc. <sales@…>
Compile time defines: enable_crypto=yes enable_debug=yes enable_def_auth=yes enable_dlopen=unknown enable_dlopen_self=unknown enable_dlopen_self_static=unknown enable_eurephia=yes enable_fast_install=needless enable_fragment=yes enable_http_proxy=yes enable_iproute2=no enable_libtool_lock=yes enable_lzo=yes enable_lzo_stub=no enable_management=yes enable_multi=yes enable_multihome=yes enable_pam_dlopen=no enable_password_save=yes enable_pedantic=no enable_pf=yes enable_pkcs11=no enable_plugin_auth_pam=yes enable_plugin_down_root=yes enable_plugins=yes enable_port_share=yes enable_selinux=no enable_server=yes enable_shared=yes enable_shared_with_static_runtimes=no enable_small=no enable_socks=yes enable_ssl=yes enable_static=yes enable_strict=no enable_strict_options=no enable_systemd=no enable_win32_dll=yes enable_x509_alt_username=no with_crypto_library=openssl with_gnu_ld=yes with_mem_check=no with_plugindir='$(libdir)/openvpn/plugins'

The client side is:
OpenVPN 2.3_rc1 x86_64-unknown-linux-gnu [SSL (OpenSSL)] [LZO] [EPOLL] [PKCS11] [eurephia] [MH] [IPv6] built on Oct 31 2012
Originally developed by James Yonan
Copyright (C) 2002-2010 OpenVPN Technologies, Inc. <sales@…>
Compile time defines: enable_crypto=yes enable_debug=yes enable_def_auth=yes enable_dependency_tracking=no enable_dlopen=unknown enable_dlopen_self=unknown enable_dlopen_self_static=unknown enable_eurephia=yes enable_fast_install=yes enable_fragment=yes enable_http_proxy=yes enable_iproute2=yes enable_libtool_lock=yes enable_lzo=yes enable_lzo_stub=no enable_management=yes enable_multi=yes enable_multihome=yes enable_pam_dlopen=no enable_password_save=yes enable_pedantic=no enable_pf=yes enable_pkcs11=yes enable_plugin_auth_pam=yes enable_plugin_down_root=yes enable_plugins=yes enable_port_share=yes enable_selinux=yes enable_server=yes enable_shared=yes enable_shared_with_static_runtimes=no enable_small=no enable_socks=yes enable_ssl=yes enable_static=yes enable_strict=no enable_strict_options=no enable_systemd=yes enable_win32_dll=yes enable_x509_alt_username=no with_crypto_library=openssl with_gnu_ld=yes with_mem_check=no with_plugindir='$(libdir)/openvpn/plugins'

Logfiles with a high debug level will be attached. If you need resources to reproduce or to validate a potential fix, please contact me.

Initially, I was attempting this with 2.3_beta1 on the server side; however changing this upwards to 2.3_rc1 did not change the behavior.

Thanks.

Attachments (1)

LogfilesTicket238.zip (57.3 KB) - added by DanielZuck 11 years ago.
Logfiles from either side at debug 6; with either setup (nobind active or not)

Download all attachments as: .zip

Change History (6)

Changed 11 years ago by DanielZuck

Attachment: LogfilesTicket238.zip added

Logfiles from either side at debug 6; with either setup (nobind active or not)

comment:1 Changed 11 years ago by DanielZuck

Forgot to mention: Frankly the impact is servere - after the TLS error, the connection is dropped and reestablished. But as the server does not see any problem, it still has the old context active, so the "new" connection is not initialized properly. However, after another minute it rolls again, and then the connection is properly installed.

So it is one minute connected with payload, then one minute a not properly initialized session, then one minute connected with payload and so forth.

The and related to the network setup "old setup" means another client and another server. So this two 2.3_rc1 endpoints do not have any link with the other running setup.

comment:2 Changed 11 years ago by Samuli Seppänen

Can you attach your server/client config files to this report? We'd need to be able to reproduce this to fix it.

comment:3 Changed 11 years ago by JoshC

The cause is very simple: you're starting the client instance twice. The reason this "works" for you when you disable the --nobind option is because the 2nd instance fatally exits as the same port the 2nd instance tries to use is already in use (search for the phrase "Address already in use" in the client log without nobind.)

When using nobind, this 2nd instance starts up on the client, and they compete for the single slot allowed on the sever for a single unique X509 certificate. This is the bit in the server log (when both concurrent client-side instances have successfully started) where it reports "new connection by client". Note also at this point that the source port on the 2nd instance has been incremented by one (as the client's dynamic port allocation sources from a new port for the next outbound socket requested.)

The solution is of course not to start multiple instances of your client on a single system. If you actually intended to use the same certificate from multiple sources, the server needs the --duplicate-cn option (although generally you'll be doing this from separate client systems, not the same one.)

Looks like the initscript on your client is malfunctioning and needs correction. To be clear, this is not a bug in openvpn, but a bug in your distribution's initscript.

comment:4 Changed 11 years ago by JoshC

Priority: majortrivial

comment:5 Changed 10 years ago by Samuli Seppänen

Resolution: wontfix
Status: newclosed
Note: See TracTickets for help on using tickets.