Opened 4 years ago

Closed 4 years ago

#787 closed Bug / Defect (notabug)

Connection breaking after a few seconds leaving no useful debugging traces

Reported by: doertedev Owned by:
Priority: major Milestone: release 2.3.13
Component: Networking Version: OpenVPN 2.3.13 (Community Ed)
Severity: Not set (select this one, unless your'e a OpenVPN developer) Keywords: connection, breaking, gentoo, client


Oy. This is the first time in my entire life where I cannot seem to make something work that reliably works on a) macs, b) ubuntus and c) my android phone (if that wasnt WTF enough):

Okay so:

I successfully connect to a server (in AWS), have proper DNS resolution (all server's who I'm failing to talk to are properly resolved), but either doing https or ssh fails after a few seconds. How does this feel like? SSH to a machine via ssh -i $keyfile 10.x.x.x (vpn IP), fire the first command after the greeting message and the connection hangs itself up.
Same for https: call the URl in the browser, get a htpasswd auth response, type in the credentials, and the connection hangs again until the timeout says nope and throws an error at me.
Weirdly enough a) my colleagues report everything working properly and b) everything is working on my android phone. It has the same settings and same config et al... I'm pretty clueless here.

openvpn --cd /etc/openvpn/client --config AWS.conf --daemon openvpn-aws --syslog openvpn --log /tmp/openvpn.log --verb 5 --client --dev tun0 --up /etc/openvpn/ --down /etc/openvpn/ --script-security 2

Version info

[I] net-misc/openvpn
     Verfügbare Versionen:   2.3.12 (~)2.3.13 [m](~)2.4_rc1-r1 [m]**9999 {down-root examples inotify iproute2 libressl lz4 +lzo mbedtls pam pkcs11 +plugins polarssl selinux socks +ssl static systemd test USERLAND="BSD"}
     Installierte Versionen: 2.3.13(23:03:07 09.12.2016)(iproute2 lzo pam pkcs11 plugins ssl systemd -down-root -examples -libressl -polarssl -selinux -socks -static USERLAND="-BSD")
     Beschreibung:           Robust and highly flexible tunneling application compatible with many OSes

[I] dev-libs/openssl
     Verfügbare Versionen:   
     (0.9.8) 0.9.8z_p8^d
     (0)    1.0.1g[1] (~)1.0.1g-r1[1] 1.0.1h-r2[1] 1.0.1i[1] 1.0.1j[1] 1.0.1k[1] 1.0.1l-r1^d[1] 1.0.1m^d[1] 1.0.1o^d[1] 1.0.1p^d[1] (~)1.0.2-r3^d[1] (~)1.0.2a^d[1] (~)1.0.2c^d[1] 1.0.2d^d[1] (~)1.0.2d-r2^d[1] 1.0.2e^d[1] 1.0.2f^d[1] 1.0.2g-r2^d[1] 1.0.2h^d[1] 1.0.2h-r2^d[1] 1.0.2i^d[1] 1.0.2j^d 1.0.2j^d[1] [M](~)1.1.0c(0/1.1)^d
       {+asm bindist gmp kerberos rfc3779 sctp sse2 sslv2 +sslv3 static-libs test (+)tls-heartbeat vanilla zlib ABI_MIPS="n32 n64 o32" ABI_PPC="32 64" ABI_S390="32 64" ABI_X86="32 64 x32" CPU_FLAGS_X86="sse2"}
     Installierte Versionen: 1.0.2j^d[1](19:07:17 07.12.2016)(asm rfc3779 sctp tls-heartbeat zlib -bindist -gmp -kerberos -sslv2 -sslv3 -static-libs -test -vanilla ABI_MIPS="-n32 -n64 -o32" ABI_PPC="-32 -64" ABI_S390="-32 -64" ABI_X86="64 -32 -x32" CPU_FLAGS_X86="sse2")
     Beschreibung:           full-strength general purpose cryptography library (including SSL and TLS)

Server config

port 1194
proto udp
dev tun
keepalive 10 120
push "dhcp-option DNS"
push "dhcp-option DOMAIN"
push "dhcp-option DNSMODE full"
push "route"
push "route"
push "route"
push "route"
keepalive 10 30
ca   /etc/openvpn/pki/cabundle.pem
key  /etc/openvpn/pki/server.key.pem
cert /etc/openvpn/pki/server.pem
dh   /etc/openvpn/pki/dh.pem
client-config-dir /etc/openvpn/clients
ifconfig-pool-persist /etc/openvpn/ipp.txt
user nobody
group nogroup
status /etc/openvpn/openvpn-status.log
log-append  /var/log/openvpn.log
verb 1
mute 10
script-security 1

Client config

dev tun
proto udp
remote x.x.x.x 1194 udp
resolv-retry infinite
remote-cert-tls server
ca /etc/openvpn/cabundle.pem
cert /etc/openvpn/aws.pem
key /etc/openvpn/aws-key.pem
verb 5
log /tmp/openvpn.log

Attachments (1)

openvpn.log (23.4 KB) - added by doertedev 4 years ago.

Download all attachments as: .zip

Change History (8)

Changed 4 years ago by doertedev

Attachment: openvpn.log added


comment:1 Changed 4 years ago by doertedev

Oh and since the bugtracker doesn't want me to edit stuff... Sorry for the german. Yes. As you can guess, "Installierte Versionen" is "installed versions".

comment:2 Changed 4 years ago by David Sommerseth

In the log file, I see this line:

Fri Dec  9 23:22:47 2016 us=464128 event_wait : Interrupted system call (code=4)

So what seems to happen here is that some kind of interrupt happening while OpenVPN is doing the event_wait() call in io_wait_dowork(). Most likely it is a signal happening. But that smells like a signal we're not normally catching, as I'd expect a different behaviour in those cases. But it is interesting that it is a controlled shutdown. The signals which causes that are normally SIGINT and SIGTERM. The other signals we catch normally dumps some statistics, reconnects and similar "keep the process running" operations.

I think you need to look at increasing the log level to 9. And perhaps even try to run OpenVPN through gdb with a trigger to break the run when status < 0 when returning from event_wait() in forward.c:1611. To have a look at the backtrace to see what the process is doing exactly when this happens.

Last edited 4 years ago by David Sommerseth (previous) (diff)

comment:3 Changed 4 years ago by doertedev

Oh I am sorry I didnt mention this: the Interrupted system call was me stopping openvpn. I wanted to supply the shutdown messages as well. :-/ If a message would really indicate that he cannot keep the established connection I would've had something I could at least google for or grep this wiki by... :(

Here's loglevel 9. And again, the interruption ( Sat Dec 10 00:44:24 2016 us=913414 ) was me ending openvpn.

Last edited 4 years ago by doertedev (previous) (diff)

comment:4 Changed 4 years ago by Selva Nair

Is your ssh or http connection through the tunnel to the same aws instance as the vpn server? If its to a VPC peer, AWS routers may be dropping packets if the source address is not that of the instance (unless you are natting on the server). That said something like that should affect all clients not just a mac or ubuntu.. Also check the routing table on the server and client as well as any chance of a IP number clash. Based on the logs the tunnel appears to be up and exchanging internal ping packets so this has to be a routing issue.

comment:5 Changed 4 years ago by Gert Döring

this sounds like MTU issues, tbh. TCP packets that are too big and cause OpenVPN UDP fragmentation are not delivered, so "everything hangs when the first command is executed".

A strong indication for that would be

  • ssh works
  • typing "date" in ssh (small response) works, every time
  • typing "ls -l /dev" breaks (response bigger than 1500 bytes, at least on most systems)

try setting --mssfix 1200 in your openvpn configs, and see if that improves anything.

comment:6 Changed 4 years ago by doertedev

Thanks everyone involved.

Turns out, when I copy the fritzbox's default MTU, it works just fine. Across two cable providers. What the hell.

You've been of enormous help. Also big thanks for the rapid responses. Love openvpn!

comment:7 Changed 4 years ago by David Sommerseth

Resolution: notabug
Status: newclosed
Note: See TracTickets for help on using tickets.