[[TOC(inline, depth=1)]] = Introduction = Microsoft has some documentation about HLK testing and WHQL signing, but it is quite incomplete, and there is lots of room for speculation and anecdotes. Practical testing is often required to understand the requirements fully. Therefore some of the requirements documented in this article are bound to change. Different Windows versions have different kernel-mode signing options: * Windows 7/8/8.1/Server 2012r2 * Cross-signing * WHQL-certified (HCR) * Windows 10 desktop * Attestation signing * WHQL-certified (HLK) * Windows Server 2016/2019 * WHQL-certified (HLK) In this article we focus on HLK testing, which allows getting a WHQL signature for a kernel-mode driver. This is the only way to make a driver load on Windows Server 2016 and later. = Getting source code for tap-windows6 = Sgstair patched tap-windows6 to pass the HLK tests. At the moment (3rd May 2019) his work is not fully merged yet, so use [https://github.com/sgstair/tap-windows6/tree/hlkwork his branch] as a basis. = HLK test environment overview = HLK testing always requires a HLK !Controller/Studio node, plus one or more HLK clients. The HLK client Windows version and HLK versions need to be in sync [https://docs.microsoft.com/en-us/windows-hardware/test/hlk/ as described in MS documentation. According to practical testing done by [https://www.wintun.net/ wintun] developers it is possible to get a code signature that is valid for all Windows 10 platforms using the following HLK clients: * Windows Server 2019 (64-bit) * Windows Server 2019 core (64-bit) * Windows 10 desktop (32-bit) Wintun was able to pass HLK testing without any ''physical'' HLK clients. But due to wintun's narrower scope it had to pass much fewer HLK tests (~50 in total) than tap-windows6. Both Windows Server 2016 and 2012r2 should work fine as the HLK controller. The controller can be virtualized (e.g. Virtualbox or VMWare). For tap-windows6 testing a support machine is also needed. To be on the safe side use the same OS version and build as the HLK client. There are some additional requirements for tap-windows6 that stem from generic [https://docs.microsoft.com/en-us/windows-hardware/test/hlk/testref/lan-testing-prerequisites LAN testing prerequisites]: * HLK client needs at least 4 virtual processor cores (unverified) for Windows Server certification * HLK clients need to be physical computers, not virtualized (unverified) Also remember to rename the network devices ("MessageDevice", "SupportDevice0") as described [https://docs.microsoft.com/en-us/windows-hardware/test/hlk/testref/lan-testing-prerequisites LAN testing prerequisites]. Also try to get a quality fast network switch. Slow switches can cause issues on some of the tests. However, a cheap 10€ switch may end up working just fine. = Installing HLK software = For HLK software installation please refer to the official MS documentation, check out [https://github.com/Puppet-Finland/puppet-hlk/ puppet-hlk] or try out the [https://docs.microsoft.com/en-us/windows-hardware/test/hlk/getstarted/getstarted-vhlk Windows Virtual Hardware Lab Kit]. The version of HLK you need to install depends on the version of Windows you're attempting certify as described in [https://docs.microsoft.com/en-us/windows-hardware/test/hlk/ Microsoft documentation]. To check Windows version from Powershell do: {{{ PS> [System.Environment]::OSVersion.Version }}} = Preparing HLK clients for test-signed drivers = Installation of HLK client software automatically enables test signing mode in Windows. Tap-windows6 build system supports test-signing the driver automatically. You need to put the automatically generated test certificate to the Windows certificate store on the HLK clients. After that you can install the test-signed driver without signature errors. = Setting up OpenVPN for HLK tests = The "Run tests" in HLK fail consistently unless the tap-windows6 adapter has an IPv6 gateway address. This can be resolved by a simple bridged peer to peer OpenVPN setup, where interface settings are configured statically outside of OpenVPN. First install the latest OpenVPN 2.x on the HLK client and support machine. Then install the same, test-signed (to-be-HLK-tested) tap-windows6 driver on the HLK clients. Then configure static IP, netmask, gateway, etc. for the TAP interface. Then generate a shared secret with "openvpn --genkey". The OpenVPN configuration files for HLK client and support machines can be identical except for the "remote" settings: {{{ dev tap mode p2p cipher AES-256-CBC secret hlk.key remote verb 3 }}} The above setup is symmetric in the sense that neither node is a client or a server, and either one can initiate the connection. This is similar to what is described in the [https://openvpn.net/community-resources/static-key-mini-howto/ static key mini-howto]. Once OpenVPN is configured properly make sure that OpenVPN is running and automatically starts on boot: {{{ PS> Set-Service OpenVPNService -StartupType Automatic -Status Running }}} Some have had more luck with the legacy service (OpenVPNServiceLegacy). Reboot and ping each host to make sure everything is working. Not sure if the below had any effect but was changed when doing this test: In OemVista.inf.in: *PhysicalMediaType = 0x0 ; NdisPhysicalMediumUnspecified This was done to be consistent with what was in constants.h, but it also seemed to get some tests passing. Changing the TAP interface type to be a virtual adapter did not work in this test pass. It seemed to mess up the NDIS tests which looked for a device that advertised as physical to assign a SupportDevice. Maybe this is something we can eventually work with Microsoft to fix. == Testing OpenVPN connectivity == Assuming the above OpenVPN config you can very correct OpenVPN / tap-windows6 operation easily: * Verify that the TAP adapter has an IPv6 address (e.g. using ipconfig) * Ensure that the HLK client can ping the following VPN server addresses: * 10.218.112.1 * 2001:db8:6666::2 = Preparing HLK studio/controller = == Set product type == In the HLK Studio's Projet tab you will see a category called "Product Types" at the right, like in [https://docs.microsoft.com/en-us/windows-hardware/test/hlk/user/hlk-studio here]. When you create a new project the "Product Types" section will be empty, and you need to make it says LAN, or you end up doing all the testing work and only get certified as an "Other Driver". HLK product types are listed in the [https://docs.microsoft.com/en-us/windows-hardware/test/hlk/user/hlk-product-type-matrix product type matrix]. The way to add the "LAN" product type is to go to the "Selection" Tab, select "Device Manager" on the left, then right click on the TAP adapter and add the feature for Device.Network.LAN.PM (power management). HLK then notices that all features for the "LAN" product type are met and it adds the "LAN" product type to the project. == Loading compatibility playlists == Make sure to get the HLK Hardware Compatibility Playlists (on the main HLK download page), and apply the one for the correct context, e.g. HLK Version 1809 CompatPlaylist x64 Server.xml. The playlist narrows down the list of tests to the set required to get an HLK certification, removes some extra stress/failure verification type tests designed to help find driver crashes. There’s a “Load Playlist” option in the tests panel in the HLK studio app that you can use. == Adding machines to pool == It seems required/useful to add HLK clients to the pool first, then the support machine. Otherwise when you select the driver in "Device manager" tab HLK will assume that the HLK support machine has the driver under test. This ''may'' create problems down the line. = Configuring and running the mandatory HLK tests = == Multi-machine tests == Setting up multi-machine tests is pretty straightforward to do. You just need to name some of the devices on the test client (that actually runs the tests) and support client (second machine that is connected “back to back” through the VPN). Details in [https://docs.microsoft.com/en-us/windows-hardware/test/hlk/testref/lan-testing-prerequisites LAN testing prerequisites]. Then when you go to schedule the multimachine test, you can change the “role” dropdown to the support machine, and select the support machine (should be there if you have two ready machines in the pool). Some of the non-NDIS tests that require “a working network connection” can just be given a valid IPV6 address on the VPN network and they will be happy with that. Whenever you see the “WDTFREMOTESYSTEM” parameter when scheduling a test, set it to an IPv6 address the system can ping over the VPN link. This could be the ipv6 address of the “support” system, or it could be some other arbitrary system on the VPN. This might not be necessary if the device under test has a working IPv6 gateway address that points to, say, support machine's tap adapter interface. But wearing belt and suspenders is not a bad idea when running HLK tests. == NDISTest 6.5 - 2 machine - Linkcheck == This test tests plugging and unplugging the (virtual) ethernet cable. For this test you have two options: * Start and stop the service when running this to make the link start and stop * Note that you can't do this from Powershell prompt of the DTMLLUAdminUser which you may end up being in. You can still use the graphical services management, or possibly use an elevated Powershell prompt. * Use Tapdiag to change link state on the test system In either case follow the interactive messagebox prompts. You may need to detach and attach the "cable" twice. == NDISTest 6.0 - 2 Machine - 2c_Priority == Use Tapdiag to enable 802.1Q on both machines before running the NDIS QoS test (2c_Priority). The process for using tapdiag is: 1. tapdiag /enable # Sets a registry key that enables the .tapdiag endpoint in the driver 1. Restart the TAP driver, reboot or disable/enable in device manager. May need to stop the openvpn service to avoid a reboot – Reconnect VPN 1. tapdiag /link:[on|off] # Use to unplug/plug ethernet cable 1. tapdiag /setq:[on|off|always] # to set the 802.1Q handling. Driver now disables 802.1Q by default. Note that the tapdiag configuration is runtime only – if you reboot the test machine, you will lose the 802.1Q state. == NDISTest 6.5 - 2 Machine - AddressChange == This will definitely fail on slower machines. It worked fine on 1 year old i7 systems. == NDISTest 6.5 - 2 Machine - E2EPerf == The "" test seems to fail if the test adapter and the support adapter have different link speeds. This is visible from the logs of "Run NDISTest Client (no verifier)": {{{ ERROR: Support Adapter must be connected at a link speed greater than or equal to the Test Adapter }}} This is a problem if the support machine has an old tap-windows6 driver version that claims to be 100Mbps whereas the device under test ("DUT") is advertising itself as 1Gbps device. Resolve by installing the correct (1Gbps) tap-windows6 driver version to the support machine as well. == NDISTest 6.0 - 1 machine - 1c_FaultHandling == This test is known to fail in at least two ways. The more serious one is that it starts looking for non-existing files from the Controller SMB share at the "Copy downlevel NDISTest binaries". The error message is pretty clear: {{{ Cause : Failed to Start the Task Cause : Failed to Copy File : "\\controller.hlk.local\tests\AMD64\nethlk\ndistest\bin\ntndis62\ndprot62.sys" Dest : "C:\hlk\JobsWorkingDir\Tasks\WTTJobRun76257277-2193-E911-82AA-080027895339\ndistest\bin\ntndis62\ndprot62.sys" Failure : Failed to Start the Task "Copy downlevel NDISTest binaries" Cause : Cannot Find Pattern "\\controller.hlk.local\tests\AMD64\nethlk\ndistest\bin\ntndis62\ndprot62.sys" Cause : Failed to Copy File : "\\controller.hlk.local\tests\AMD64\nethlk\ndistest\bin\ntndis61\ndprot61.sys" Dest : "C:\hlk\JobsWorkingDir\Tasks\WTTJobRun76257277-2193-E911-82AA-080027895339\ndistest\bin\ntndis61\ndprot61.sys" Cause : Cannot Find Pattern "\\controller.hlk.local\tests\AMD64\nethlk\ndistest\bin\ntndis61\ndprot61.sys" Cause : Failed to Copy File : "\\controller.hlk.local\tests\AMD64\nethlk\ndistest\bin\ntndis6\ndprot60.sys" Dest : "C:\hlk\JobsWorkingDir\Tasks\WTTJobRun76257277-2193-E911-82AA-080027895339\ndistest\bin\ntndis6\ndprot60.sys" Cause : Cannot Find Pattern "\\controller.hlk.local\tests\AMD64\nethlk\ndistest\bin\ntndis6\ndprot60.sys" Cause : Failed to Copy File : "\\controller.hlk.local\tests\AMD64\nethlk\ndistest\bin\ntndis51\ndprot51.sys" Dest : "C:\hlk\JobsWorkingDir\Tasks\WTTJobRun76257277-2193-E911-82AA-080027895339\ndistest\bin\ntndis51\ndprot51.sys" Cause : Cannot Find Pattern "\\controller.hlk.local\tests\AMD64\nethlk\ndistest\bin\ntndis51\ndprot51.sys" }}} If you mount the share manually on the HLK client you'll notice that either the entire directories or individual .sys files are missing indeed: {{{ PS> net use X: \\controller.hlk.local\tests The command completed successfully. PS> Get-Childitem x:\amd64\NetHlk\NDISTest\bin\ -filter "ntndis*" Directory: x:\amd64\NetHlk\NDISTest\bin Mode LastWriteTime Length Name ---- ------------- ------ ---- d----- 5/30/2019 7:07 AM ntndis51 d----- 5/30/2019 7:07 AM ntndis630 d----- 5/30/2019 7:07 AM ntndis650 d----- 5/30/2019 7:07 AM ntndis660 d----- 5/30/2019 7:07 AM ntndis680 }}} Once you get past this failure, at the end of the test the Service might not re-establish a link. WHQL seems to get the OpenVPN service into a bad state, just restart the OpenVPN service and the test will pass. The test waits a few minutes for the link to come back up. == NDISTest 6.0 - 1 machine - 1c_Registry == This test may fail due to files missing from the HLK controller, in the exact same way as 1c_FaultHandling above. == NDISTest 6.0 - 2 machine - 2c_Mini6Stress == Sometimes this test will hit a breakpoint in the NDIS test code. The breakpoint seems harmless and complained about some packets not getting confirmed. If you don't connect a kernel debugger this will cause a BugCheck and the test will fail. If this happens connect a kernel debugger and rerun the test. When the breakpoint is hit press (l) Always Ignore and the test will pass. You may also get away with just rerunning the test until it passes. == TDI filters and LSPs are not allowed == This test may fail due to broken network drive mappings. The hints are available in "Infrastructure -> Execution Logs -> WttEa.log: {{{ 2980 4496 2019:5:31 18:48:8:90 Error: 0x8205aaaf, Error 0x8205aaaf winsockerror code of 11001 File=sdktools\wtt\jobs\wtttransportproviders\wttcommtcpip\src\wttcommtcpip.cpp Line=656 PersistManager:EA:JobCancel::token::57M710T-> --- snip -- 2980 5488 2019:5:31 18:48:11:210 Error: 0x800704c3, Multiple connections to a server or shared resource by the same user, using more than one user name, are not allowed. Disconnect all previous connections to the server or shared resource and try again. CRunManager::GetLogLocation()::(null)::CAUSE:Error returned from EnableShareAccess to the root of "\\controller.hlk.local\HLKLogs\EaFolderAccessCheck\4825BE12-3B91-4414-B8E2-5AD703D69BB3" File=sdktools\wtt\jobs\runtime\wttexecutionagent\eamanager\runmanager\src\runmanager.cpp Line=2379 }}} If this happens you should see a network share that is unavailable: {{{ PS C:\Users\Administrator> net use New connections will be remembered. Status Local Remote Network ------------------------------------------------------------------------------------------- Unavailable R: \\hlk-controller.openvpn.in\HLKInstall Microsoft Windows Network The command completed successfully. }}} To resolve, unmount the network drive: {{{ PS C:\Users\Administrator> net use R: /DELETE R: was deleted successfully. }}} == Static Tools Logo test (optional) == The Static Tools Logo Test checks that the driver build is using the static driver code analysis tool (SDV). Even if that is the case it is necessary to copy some files so that it can read the logs and check. Otherwise the test will fail. = Various test issues = == Packet transmission too slow == This can result in test errors like "Expected minimum of 237 packets but we received 200 packets". The test allows one second after all the sends have been completed for all the packets to be received. Disabling verbose log printing in the server makes it more reliable. == Packets reordering == Packets can (rarely) be reordered in flight, which causes an assertion in the test driver. Hints are errors such as * "1 total breakpoints were hit in the protocol driver while this test was executing" * "Out of order indication" * "Dropped indications" HLK doesn't complain if some packets are lost, instead these errors are raised when out of order packets are received. The OpenVPN architecture has some inherent race conditions that can cause reordering of packets. This has happened 2 or 3 times over dozens of runs, and does cause a test failure, but a rerun should pass. == Address Change test failures == You may encounter failures in the "Address Change" test. It’s a combination of a few factors: * OpenVPN client sets the network link status around a second before the server actually starts forwarding any packets to the client. This is quite possibly an order of operations bug in openvpn. * That would be well and good, but the address change test has extremely aggressive timings * It starts sending packets almost immediately once the link status comes up, and stops listening for packets less than a second after that. This failure seems to be related to newer OpenVPN versions. Logs for reference: {{{ (Test system connecting after MAC address changes) … Wed Feb 20 22:03:43 2019 us=810407 vpnclient-nopass/192.168.1.36:1194 Data Channel MTU parms [ L:1581 D:1450 EF:49 EB:411 ET:32 EL:3 ] Wed Feb 20 22:03:43 2019 us=810637 vpnclient-nopass/192.168.1.36:1194 Outgoing Data Channel: Cipher 'AES-256-GCM' initialized with 256 bit key Wed Feb 20 22:03:43 2019 us=810765 vpnclient-nopass/192.168.1.36:1194 Incoming Data Channel: Cipher 'AES-256-GCM' initialized with 256 bit key (Other system starts to send test packets) WWWWRWed Feb 20 22:03:48 2019 us=680899 vpnclient-nopass/192.168.1.151:1194 MULTI: unknown unicast destination [00:ff:e9:88:37:ca], flood wWRWed Feb 20 22:03:48 2019 us=681534 vpnclient-nopass/192.168.1.151:1194 MULTI: unknown unicast destination [00:ff:e9:88:37:ca], flood wWRWed Feb 20 22:03:48 2019 us=681869 vpnclient-nopass/192.168.1.151:1194 MULTI: unknown unicast destination [00:ff:e9:88:37:ca], flood … wWRWed Feb 20 22:03:49 2019 us=158052 vpnclient-nopass/192.168.1.151:1194 MULTI: unknown unicast destination [00:ff:e9:88:37:ca], flood (Other system is done sending packets) (Openvpn server starts forwarding packets at around this point.) wWRRRWed Feb 20 22:03:50 2019 us=472525 vpnclient-nopass/192.168.1.36:1194 MULTI: Learn: 02:02:04:06:08:08 -> vpnclient-nopass/192.168.1.36:1194 … }}} Another side note – The “unicast destination [00:ff:e9:88:37:ca]” in those messages is incorrect – those packets are actually directed to the “02:02:04:06:08:08” address. The flood message in the patch is showing the source address. == HLK client version mismatches == The HLK client version (e.g. Windows 1809) needs to match the HLK version (HLK 1809). If there is a mismatch there can be some setup issues (some semantics of driver verifier configuration changed, or some of the test components might have failed on the OS version). == Controller misuses support machine == The controller seems to arbitrarily pick which machine is Support and which one is under Test. If it has trouble picking name one of the TAP adapters SupportDevice0. Picking the options in the UI didn't change the behavior. So for the tests you are baby sitting pay attention to which the server and and which one the client is. The server will run a server.htm in NDIS test so it will be obvious. == Reboots == Reboots seems to happen randomly during test setup. This can be a nuisance if you are monitoring the services window or the network connections window. Helpful to make shortcuts to these so they can easily be opened. = HLK logging = HLK controller logs problems mostly to event logs, so when tests are failing in an unusual fashion you can possibly find out why in the event viewer. Causes for test failures are generally well visible from the logs in HLK Studio. = Debugging = It is possible to get HLK into strange states. For example: * HLK clients / support machines do not get into "Ready" state * Launching two-machine tests fail with duplicate database key errors Some things you can try to get past these errors: * Restart the "hlksvc" Windows service * Reboot HLK nodes (controller, clients, support machine) * Reinstall HLK client software and reboot HLK client * Create a new HLK project * Create a new pool and move HLK client machines there * Delete HLK clients from the pool The are known to work sometimes, but so far I have not been able to establish any logic here. The HLK tests fortunately provide tons of logging. Check sections "Error", "Task log", "Infrastructure" in the context menu for the test in question to see what you can find. = Addendum = == Firewall rules for HLK server and clients == Installing HLK software automatically opens ports in the Windows firewall for HLK traffic. In case HLK controller and HLK clients are not in the same switch some firewall (e.g. EC2 security group rules) might block HLK traffic. Here is a reference for the ports which need to be open for HLK tests to work: * OpenVPN peer (udp/1194) <-> OpenVPN peer (udp/1194) * HLK clients -> HLK controller tcp/1771 (HLK Server Receiver Port) * HLK clients -> HLK controller tcp/1782 (HLKSvc Receiver Port) * HLK clients -> HLK controller tcp/445 (HLKInstall Samba share) * HLK controller -> HLK clients tcp/1771 (HLK Server Receiver Port) Outbound traffic is assumed to be unrestricted. If not, adjust egress rules accordingly. Also note that IPv6 traffic needs to flow properly in the OpenVPN virtual network as HLK tests require IPv6. = External links = * [https://techcommunity.microsoft.com/t5/Windows-Hardware-Certification/bg-p/WindowsHardwareCertification Windows Hardware Certification blog] (updates on new HLK releases etc.) * [https://docs.microsoft.com/en-us/windows-hardware/test/hlk/user/troubleshooting-windows-hlk Troubleshooting Windows HLK] * [https://github.com/sgstair/tapdiag tapdiag]: a tool that is used to manipulate tap-windows6 at runtime for some HLK tests * [https://github.com/Puppet-Finland/puppet-hlk/ puppet-hlk]: Puppet module for setting up HLK controllers and HLK clients * [https://docs.microsoft.com/en-us/windows-hardware/test/hlk/getstarted/getstarted-vhlk Windows Virtual Hardware Lab Kit] * [https://aka.ms/HLKPlaylist] Compatibility Play Lists (Understanding is these are the only ones necessary to get signed) * [https://docs.microsoft.com/en-us/windows-hardware/drivers/dashboard/get-drivers-signed-by-microsoft-for-multiple-windows-versions Getting drivers signed by Microsoft for multiple Windows versions]