OpenVPN Community Meetup 2024
Table of Contents
Dates
Set to 20-22 September 2024
Venue
Game Studio at Steamworks, Roonstr. 23a, Karlsruhe, Germany.
Hotels close-by: Hotel Santo https://www.hotel-santo.de/
Who is coming?
Name | Topics | Arrival | Departure | Hotel | T-shirt size |
Lev Stipakov | dco-win P2MP in 2.7 | Thursday evening | Sunday afternoon-ish | Santo | M |
Gert Döring | routing, VRFs, policy on Linux. Removal of wintun support. 2.7 time plan | Thursday 18:53 (IC 2064) | Sunday 11:06 (IC 2067) | Santo | XL |
Arne Schwabe | Thursday evening | Monday morning | Santo | XXL | |
Johan Draaisma | Thursday 19 September | Monday 23 September | Santo | XL | |
Frank Lichtenheld | 2.7 release process, general merge process | N/A (Home town) | N/A | N/A | XL |
Heiko Hund | n/a | n/a | n/a | XL | |
Max Fillinger | TLS 1.3 with mbed TLS (again) | Thursday ~ 19:00 | Monday | Santo | XL |
Steffan Karger | Crypto stuff | Thu ~ 19:00 | Sun ~ 10:00 | Santo | M |
Rein van Baaren | tbd (probably Thursday) | tbd (probably Monday) | XL | ||
Gianmarco De Gregori | Implement netlink introspection for DCO APIs | Thursday ~ 20:00 | Sunday around noon | Santo | M |
Yuriy Darnobyt | Thursday 19 September | Monday 23 September | Santo | L | |
Antonio Quartulli | 2.7 release process (can we somehow formalize it?) | N/A | N/A | M | |
André (Pippin) | Shirt only, maybe sticker for laptop | n/a | n/a | M | |
Jan Just Keijzer | can't make it | XL | |||
Samuli Seppänen | Buildbot, CI/CD, test frameworks | Thursday evening | Sunday afternoon-ish | Santo | L |
Richard T Bonhomme | TShirt only, with thanks | unable to travel | XL | ||
Reynir Björnson | can't make it |
Meeting topics (so far)
- OpenVPN 2.7
What are the major new features going in here?
Roughly by when do we want to have 2.7 ready (before next debian)?
- OpenVPN 2.7 release process
- General merge process
How does the process work, what changes were made (e.g. Gerrit, more tests in buildbot)
Where do we have problems/shortcomings, what do we want to improve further?- gert: "if possible, please include instructions how to test a change (or how to verify that 'nothing changes') in the commit message"
- approach to large(r) scale architectural changes
- "just hack on it, and then send a 20-odd page set"?
- "agree on the general direction first"?
- lzo compression
How will we deal with this, when will we remove it?
- krzee is asking for failover improvements
In openvpn3 some improvements have been made to allow better automatic failover when a server goes down.
The ask is if we can port some of those improvements to OpenVPN2?
Related github issues: https://github.com/OpenVPN/openvpn/issues/281 and https://github.com/OpenVPN/openvpn/issues/282
- patch/code flow in ovpn-dco-win (https://github.com/OpenVPN/ovpn-dco-win/pull/79)
- (which branches, what are the rules)
- patch/code flow in openvpn-build
- which branches, what are the rules?
- have we written this up somewhere?
- dco-win multipeer
- target for 2.7?
- scope (no TCP, no iroutes, no wireguard format support?)
- help with testing
- check the status of "technology preview" builds, ensure we build those from master + dco v2 (with multipeer)
- removal of wintun support
- DNS and MacOS and Tunnelblick
- if a laptop moves to a different wifi/lan segment while Tunnelblick is active *and* has
--dns
set up, Tunnelblick (or OpenVPN via --down script?) will restore "the old network DNS settings" on openvpn exit. Do we have a MacOS expert who can suggest how to do this better?
- if a laptop moves to a different wifi/lan segment while Tunnelblick is active *and* has
- Platforms and Tunnelcrack
- Windows is covered
- but WFP rules still permit inbound connects via LAN - intentional?
- Linux
- ip rules, ip tables, VRFs, whatnot...?
- macOS
- can it be done? if yes, how?
- other platforms are not typical "roaming laptop" platforms, so maybe not really in scope (if doable at all)
- Windows is covered
- DDoS resistance (bloom filters)
- where do we want to go? is this still needed?
- AES-GCM improvements (Arne, Steffan)
- and DATA_V3 patch sets for Userspace + dco-win
- "driver" reconstruction of tun.c & friends
--cipher
,--data-ciphers
, Warnings, and #NM?- https://gerrit.openvpn.net/c/openvpn/+/746
- what shall we do with a lone
--cipher
in the config in 2.7?- ignore (silent, D_LOW)
- warn, and re-word the warning so people on #nm actually understand
- bring back the "just append to
--data-ciphers
" feature (maybe not if BF-CBC)? ... which has consequences for DCO, of course. - there seems to be a large userbase using AES-*-CBC and having non-compliant servers
- have
--data-ciphers +AES-128-CBC
to append to list, and reword warning so people do this?
- the reason we have this: servers that are fixed to AES-128-CBC - by admins, or because it's not real OpenVPN but SoftEther? or such, so NCP to AES-*-GCM does not work
- multisocket patch set
- mesh VPN
- custom control channel message "the DPC stuff"
- PUSH_UPDATE
- locked-username-replacement (pushing auth-token-user + auth-token) (Arne/Gert?)
Meeting Notes
Original cryptpad: https://cryptpad.fr/pad/#/2/pad/edit/KPJKt7RSPrvC9grrvQ0KhbwE/ These notes are edited to be more accessible.
OpenVPN 2.7
Feature Set
For more detailed discussions on most of these items see below.
- Already done
- Deprecation/removal stuff
- Static key mode going into deprecation
- Tunnelcrack improvements for Windows
- Deprecation/removal stuff
- Must have (would block release):
- Deprecation/removal stuff
- Wintun driver support (Lev?)
- Compression support (Frank)
- Multi-socket support (Antonio, Gianmarco)
- --dns implementation (Heiko, Gert)
- New API for DCO kernel module on Linux (Antonio, Arne?)
- Deprecation/removal stuff
- Should have:
- Data v3 format with AES rekeying (Arne, Steffan)
- Tunnelcrack improvements for Linux (Heiko, Gianmarco)
- Cipher/data-ciphers add DEFAULT syntax (Arne)
- Nice to have:
- Bloom-filter DDoS protection (Max will take a look)
- Live route updates / push-update (Mister Antonio)
- dco-win multipeer (Lev)
- afunix/lwipovpn (Gert, Frank, Arne)
- Things not put into one of the above categories yet:
- app custom control
- Testing improvements that make sense to do before this release
- add server multi-socket testing support to t_server_null.sh (Samuli)
- improve Windows t_client testing (Samuli)
TIMELINE and BETA process
- target is to release latest end of January 2025 to get into new Debian stable
- "the beta release happens when
--dns
and multisocket are in" - provide windows installer etc
- branching "when it makes sense", not too early - specifics to be discussed (2.6 was not so useful, too many patches going to master+2.6)
Future ideas
- multi-channel support
- mesh support
Conclusions from discussions
Discussions about specific topic.
Compression
Let's disable compression for good going forward. Motivation is that compression is insecure. The idea is that there is no way anymore then to override enable compression.
We can remove sending compressed packets, but we have to keep receiving compressed packets, because OpenVPN peers out there may still be using compression for a long time, so we want to keep things working but not in any way encourage using compression in new installations.
Also turn on compress mitigate automatically for --server
and --comp-lzo
.
Static key mode
This is a way to run an OpenVPN connection without TLS, and using static keys. We are deprecating it as per OpenVPN 2.7 and putting a warning when people try to use it. There is an override flag to keep it working in 2.7, but it will be definitely removed in OpenVPN 2.8.
Bloom-filter DDoS protection
People generally agree we want this but there seems to be no push for it. Arne cautions it is one of those things that when you really have a need for it, it is too late. There is a patchset for this. Max is willing to review.
Background: we want this to avoid OpenVPN stopping to reply to legitimate client hellos in the situation where "someone ugly" uses OpenVPN for reflective DDoS purposes - or just sends sufficient TLS hellos to exceed the 100 packets/minute limit. The intent is to "block that /24" or "/16" or "wherever all the crap is coming from" by hashing incoming IPs to buckets and blocking on an effective-but-not-overblocking level. Drawback: +4M memory usage if turned on.
Testing is not trivial...
The real question here is "is what we have good enough" or "can/should we do better"?
Wintun support
Short background on "frictions on licensing problems, shipping problems (wintun needs binary .dll provided by Jason to upgrade to "latest version") and personal attacks"
DCO-WIN is only supported on "Win10 versions after 2020 H1" and Win11 (and recent Server versions), so "someone might need wintun support for fast OpenVPN on those old platforms"
The conclusion basically is while DCO has been as a default driver since 2.6, TAP must also be kept for older cipher situations (because DCO only supports AEAD ciphers and tun mode) and older operating systems, and use-cases like bridging. But wintun is a complication legally and technically.
Windows might benefit from having "more ciphers", like AES-128-CBC in DCO-Win - and then we could go for "only install dco-win by default, tap-windows6 only on request".
Arne "we have decided that we want to deprecate it, just how so?" - Frank "we do not fully drop support" - Gert "make --driver wintun an alias for tap-windows" (or "ignore it with warning").
Needs to be very explicit on the release note pages, and explanations "what are the best options for you now?" -> stick to 2.6+wintun (support will continue for 1+ year after 2.7 release), or use 2.7+dco/tap6. So basically people using wintun and upgrading to 2.7 will be automatically switched to TAP as this is pretty much guaranteed to just work, and if they want faster implementation, they can choose to switch to DCO, but then they of course have to check that their configs are compatible with that.
Side note: wireguard for windows now is not using wintun anymore but WireguardNT
Multi-channel support
Basic idea is that there is one central server that handles the routing and instructing the clients what to do, and serves as a pretty much guaranteed way to reach resources. There would be one VPN adapter but the data channels can go to different peers.
But there may be more optimal paths to reach resources, directly through a specific peer. The central server could instruct the VPN client to send that specific traffic through that peer. But if for example the client doesn't support this it can still go through the central server.
Live route updates / push-update
This is a new control channel message that allows to instruct capable clients to implement new client-side routes without having to drop the whole VPN connection and reestablish.
OpenVPN Inc. is basically wanting this in OpenVPN3 / OpenVPN Connect v3, and PGMT, and is working to implement that. Client-side implementations in OpenVPN3 are done for Windows and Linux. PGMT/Cloudconnexa will soon implement it.
Main motivation is that ZTNA/device posture stuff is kind of a thing that 'the market' wants and you may end up having your device changing posture and that means getting more or less access and then having to reconnect a user is a bit costly. Also, if you have a busy server that implements a new route and wants clients to implement it, you basically disconnect everyone all at once and then DoS yourself for a while while hundreds of clients reconnect at the same time.
Antonio says that Mario has time to implement live route updates in OpenVPN2. So far there was only a small blocker that the OpenVPN RFC PR hadn't had review from Gert yet and it's kind of not great to start implementing something not fully agreed on. It looks like we can get Gert's eyes on it and then move on.
Multi-socket support
This is basically done but needs review, and Antonio will do a first round of review. Antonio intends to review it in begin October but if things don't work out that way it's still better to just throw it at Gerrit (d12fk will have a look), no matter what state it is then. Better that it gets review than stuck.
Merge process
- we want windows testing in buildbot/gerrit, so the actual merge can be done with "I already know that it does not break Windows" confirmation from Gerrit
- we want to abandon patchwork -> this needs an easy way to move a patch from "the mailing list" to gerrit (Frank offered to write a script to be run from his "mutt" mail client)
- can we have good commit messages - "how to test the effect of this patch?" - especially for bug fixes, "this is how to reproduce the bug, this is what changes"
- code review improvements - "Arne and Frank do all the review"
- can we have more reviewers, please? "If you can write code, you can also help with review"
- gap between "initial review" and "Gert dislikes it" can be very high, which is bad user experience
- "what do you want me to do?" - "just press the merge button" or "add review & merge testing?"
- how to scale up Gert? ;-)
- "what do you want me to do?" - "just press the merge button" or "add review & merge testing?"
- enhance testing so "pre-merge tests" can be better distributed?
- what are we going to do with patches that have NO reviews? E.g.
- bloom filters
- HAProxy protocol
- what about features that are technically okay, but create a maintenance burden on the project (like, test infrastructure extensions)? When "do we want" something? When does it "add value to the project"?
- Steffan: Gert functions are "coordinating" and "benevolent dictator"
- the coordination role might be distributed better ("for this meeting, we should look at *those* patches?", or "xxx, can you have a look at $patch" etc)
- "let's do the things we have been doing, but make them better" ;-)
Testing improvements
- t_server tests run from gerrit?
--dns
- use hostnames for t_client ping
- have the reference servers push DNS servers that "know something not on public DNS"
- t_server_null
- add lwip client tests
- add actual ping / http / ... tests
- needs lwip support merged
- add server multi-socket testing support (Samuli)
- enhance to run "reference binaries" vs. "the binary just built" (Samuli)
- will not give us "cross platform interop" testing (FreeBSD <-> Linux), but this is very rarely a problem
- is compatible with DCO (one side uses lwip userspace IP, other side users normal OpenVPN forwarding - either "normal tun" or "dco")
- t_client enhancements
- upgrade server to DATA_V3 (Ecrist's "phillip" server)
- add client test with
--data-ciphers Chacha-Poly
(so this code gets exercised, especially with v3)
- automated windows testing from buildbot (gerrit)
- improve Windows t_client testing
- Validate that DNS and Windows Firewall setting are processed correctly
- Lev's current test suite uses OpenVPN GUI to emulate a real user
--cipher handling
In 2.6, --cipher foo
triggers a warning that people misunderstand and config --data-ciphers foo
, breaking AEAD and DCO compatibility.
- Can we downgrade it to D_LOW so it's causing less confusion? (https://gerrit.openvpn.net/c/openvpn/+/746)
- Shall we reword it instead?
- Shall we auto-append
--cipher
to--data-ciphers $list
? - Shall we add a syntax
--data-ciphers default:$foo
and include that config statement into the warning message? or--data-ciphers +$foo
? - There is another warning message - "options error: .. add the server cipher to --data-ciphers, currently ..." (not touched by patch today).
- Please do not put hardcoded list of ciphers into config files (future compatibility with to-be-deprecated and nice-and-shiny-to-be ciphers)
SUMMARY: "the new syntax is not defined here, but should be similar to OpenSSL's cipher list syntax. the warning is kept, and changed to point to the new syntax". Arne can do it, but a volunteer would be welcome.
DATA_V3 discussion
Problem: after about 228 (full-size) packets with AES-GCM key needs to be regenerated (today we regenerate about at 232). What we do currently is not "critical" yet, but not good enough. We look at TLS 1.3 and they also switch keys more often.
228 will cause rekeying "quite often" on fast links, so we do not really want to do full TLS renegotiation. MACSEC has the same issue - they do 232, and need to rekey every 9 seconds(!) on 400G interfaces when on full load.
Basic idea: use tls-exporter to get new keys frequently without full TLS renegotiation
Open question: how to signal "new key used now" - use topmost bit in 32 bit counter? go to 64 bit counter? Arne has a sketch on the tablet.
32bit counter -> on a fast network, and a lengthy link outage, the signal bits might wrap twice and keys are out of sync (normally noticing counter wraparound is easy if you can see "sufficient packets", but if $linkfast+$longoutage (15s at 400G), you desync). This is not a problem today with TLS-renegotiation every hour anyway, but might become a problem in 5 year.
(Side note: AEAD-at-end is nearly identical to IPSEC ESP packet format, and hardware-accelerated NICs for IPSEC exists - so theoretically, 200G accelerated OpenVPN look doable "in a few months"...)
Proposal 1: use 32+32 ID, top 32 bit = always key ID, lower 32 bit = run up to 228, then restart at 0 - problem: reordered packets at key rollover fall outside of window -> drop
Proposal 2: use 64 bit packet counter, and "certain bits" select the key ID (like, bit 28+29). Both ends need to agree on "what you are doing". Wraparound counter depends on *packet size*, and this might not be identically on both ends (and needs more testing, "mtu 9000" will have mask bits in different places than "mtu 1400").
DCO interaction?
Keys generated in userland by TLS library (this is the plan), so DCO gets new keys when needed -> which means DCO needs to tell userland when a new key is due ("half the sequence space used up")
Steffan: we could start with the TLS EKM key, and then use our own key derivation function to generate +1 keys from that (so DCO can do that in kernel space). (at this point the discussion derailed into ratchets and HMACs and crypto ;-) )
AES-GCM-XPN "what is that and how do they do that?" (MACSEC stuff)
Next steps: Lev, Arne, Steffan (+MaxF +Ryan) agree on how to implement it, taking DCO into account.
locked-username-replacement
short discussion between Arne and Gert - this is for pushed username for future use with tokens
to actually work, because the username is locked in the TLS context, and if a client comes back with a username while "username was empty on the first connect" its refused
Arne: "management and scripts just need to override the locked-username so pushed usernames work", but actual implementation is harder
DNS
Heiko: there is a pending implemention for --dns
for Windows (code), Unix* (script) - get feedback, how to proceed.
- Windows: done in iservice
- today: only 1 search domain can be set (wmic client)
- new code: directly sets registry & then tells resolver to update
- Unix: new approach similar to
--down-root
(removing config on exit) - fork() helper process that keeps running with privileges, and talks via pipe. Plus new--dns
script hook (in addition to--up
, so we can ship a DNS script that does "just DNS" and leave--up
to whatever a distribution wants to ship). - MacOS: there are "old" API calls (unclear life cycle information, deprecated or not). Entitlement is needed (signed binary). Maybe only for App Store binaries? Information unclear. Tunnelblick today has 5 different ways(!) to setup DNS... ("scutil --dns...")
Loong discussion...
Outcome:
- if both
--dns
and--dhcp-option DNS
are pushed,--dhcp-option DNS
is ignored - if
--dns
is pushed, 2.6 clients behave like--dhcp-option DNS
regarding NRPT and 2.7 will behave differently - on Unix,
--up
should be kept "as it is", and--dns
added "the new nice and shiny way" - Gert wants "one backend" on Windows (!) -- less testing effort on system behaviour
- Heiko wants "keep old backend for
--dhcp-option DNS
" and "new backend for--dns
" - suggestion:
- for "master, soon" go with "new backend implementation on Windows wit NRPT, which will only be used by
--dns
(--dhcp-option DNS
will go to the old implementations)" and then *solicit tests* and *use it* on Gert's production customers + Chosi's setup + Charité maybe? - if that all works, with and without split DNS and still resolving local domains just fine, reopen discussion if "old backend on windows" (netsh) should go for 2.7, or only for 2.8
- more suggestions from Johan - implement
--dns
on AS and maybe cloudconnexa.
- for "master, soon" go with "new backend implementation on Windows wit NRPT, which will only be used by
- Heiko wants "keep old backend for
Tunnelcrack
- Windows: WFP based filters are in master. Very little feedback so far.
- somewhat surprising: *inbound* connections on LAN still work (and outbound reply packets with state) - but that's an independent discussion, and such connections can be stopped with normal Windows firewall rules (gui)
- focus was *outbound* connections and that works fine to stop Tunnelcrack
- Linux: policy, firewall marks, ...? "fwmark" socket option did not work
- mark packets "coming from the LAN" (or "local") that must go to the VPN
- policy tables work "everything goes into the VPN, done"
- main routing table is only used for OpenVPN itself
- "good enough" target is "Fedora, Ubuntu" - if it breaks on other more special-case linux variants, we can look into it
- Gianmarcos's patch adds route-table-id to routes (but no policy)
- block-local can leverage this, add policy and send everything to VPN table
- on Android: "everything goes to VPN table
if(!fwmark(openvpn))
" -> so setting up the policy table is very easy recursion check can be disabled in that case
- two scenarios
- --redirect-gateway "can actually happily live with just 0.0.0.0/0 in table 77" (and not "all of the routes")
- for "Tunnelcrack mitigation" having "specific routes in table 77" (and having split tunnel) is good enough (and a nice and powerful feature)
- do we want a new option
--use-policy-routing yes
? --> maybe not yet, gather experience first
- decision:
--block-local
will ALWAYS do "redirect gateway and block-local and fully block everything" (and this is what we tell people to use for Tunnelcrack mitigation << documentation)
- MacOS
- "the Connect team will look into it"
- the VPN API cannot be used by "not App Store" programs (Apple signatures needed), so might never be possible from 2.x
Repo Discussion
- openvpn-dco-win
- bugfixes go to release/1 and then to master, new features (multipeer) to master (only)
- sufficiently different code (multipeer support) that "just cherrypick" for bugfixes is not always working, so "backport"
- review for all patches is needed
- Gert is not the best reviewer (lack of time and of Windows clue)
- openvpn-build
- patches go to release/2.6 (which is build to release 2.6.x)
- everything is merged to master (which is used to build master snapshots / 2.7)
- when 2.7.0 is released, release/2.7 is branched (and possibly new flow for 2.6 patches needs to be defined)
- openvpn "main"
- all patches & bugfixes go to master
- bugfixes (and selected other patches) are cherry-picked to release/2.6, release/2.5, release/2.4 "wherever it makes sense", depending on severity
- patches never(!) flow from release/2.x to master
- if context is too different, there will be "a master bugfix" and "a release/2.6 bugfix" patch
- or there might be "a release/2.6 bugfix"
- but never! "apply to 2.6 and then merge/cherry-pick to master"
- all patches & bugfixes go to master
Mesh
- client-to-client communication
- using server as rendevouz point and then have direct tunnels between clients
- client connects to "server 1"
- server 1 can push "extra remotes" (10.0.0.0/8 goes to <tunnel to machine x, IP address y, port Z, credentials foo(?))
- Arne's idea "peer-id 1378 remote-ip 1.1.1.1+2001:db8::7 + FP aa:bb:cc:..." --> "direct route 196.168.0.0 peer-id 1378 <flags>"
- full TLS handshake between "client 1 and client 1378" to build session key
- discussion on technical challenges ("this is all not very hard") and commercial challenges ("why are our customers asking for this?")
- Arne plans to work on this, Antonio planned to get funds from OTF to do mesh work
Windows-DCO multipeer (--server
support)
- initial implementation will not do TCP ("too many sockets")
- unsolved question so far "if a packet is given to dco-win, what is the peer to sent it to?"
- look at IP header, do route lookup (inside DCO or on Windows routing table)?
- can we make Windows do the route lookup and pass the next-hop IP to dco-win?
- in comparison: *userland* openvpn is being handed the IP packet "as a bytestream" and needs to extract the IP address from the packet and do an iroute lookup
- Linux/FreeBSD DCO: unclear how it works, Gert thinks that "the kernel routing" will pass the packet and the next-hop IP to the DCO driver (so, the DCO driver does not need to do a routing lookup to find the right peer) - needs to be verified
- Lev found that linux DCO does a route lookup inside DCO (= Linux does 2x route lookup, 1x "which is the next hop interface?" and 1x "which peer does it need to go to?")
- which variant can be done on Windows needs to be researched