wiki:HLKTesting

Version 45 (modified by Samuli Seppänen, 5 years ago) (diff)

--

Introduction

Microsoft has some documentation about HLK testing and WHQL signing, but it is quite incomplete, and there is lots of room for speculation and anecdotes. Practical testing is often required to understand the requirements fully. Therefore some of the requirements documented in this article are bound to change.

Different Windows versions have different kernel-mode signing options:

  • Windows 7/8/8.1/Server 2012r2
    • Cross-signing
    • WHQL-certified (HCR)
  • Windows 10 desktop
    • Attestation signing
    • WHQL-certified (HLK)
  • Windows Server 2016/2019
    • WHQL-certified (HLK)

In this article we focus on HLK testing, which allows getting a WHQL signature for a kernel-mode driver. This is the only way to make a driver load on Windows Server 2016 and later.

Getting source code for tap-windows6

Sgstair patched tap-windows6 to pass the HLK tests. At the moment (3rd May 2019) his work is not fully merged yet, so use his branch as a basis.

HLK test environment overview

HLK testing always requires a HLK Controller/Studio node, plus one or more HLK clients. The HLK client Windows version and HLK versions need to be in sync [https://docs.microsoft.com/en-us/windows-hardware/test/hlk/ as described in MS documentation.

According to practical testing done by wintun developers it is possible to get a code signature that is valid for all Windows 10 platforms using the following HLK clients:

  • HLK controller: Windows Server 2016
  • HLK clients
    • Windows Server 2019 (64-bit)
    • Windows Server 2019 core (64-bit)
    • Windows 10 desktop (32-bit)

Wintun was able to pass HLK testing without any physical HLK clients. But due to wintun's narrower scope it had to pass much fewer HLK tests (~50 in total) than tap-windows6.

For tap-windows6 testing a couple of extra nodes are needed:

  • OpenVPN server (for "Run tests" in HLK, see below)
  • Support machine: required by some of the HLK tests

Lan testing prerequisites

There are some additional requirements for tap-windows6 that stem from generic LAN testing prerequisites:

  • HLK client needs at least 4 virtual processor cores (unverified) for Windows Server certification
  • HLK clients need to be physical computers, not virtualized (unverified)

Also remember to rename the network devices ("MessageDevice?", "SupportDevice0") as described LAN testing prerequisites.

Installing HLK software

For HLK software installation please refer to the official MS documentation, check out puppet-hlk or try out the Windows Virtual Hardware Lab Kit.

The version of HLK you need to install depends on the version of Windows you're attempting certify as described in Microsoft documentation. To check Windows version from Powershell do:

PS> [System.Environment]::OSVersion.Version

Preparing HLK clients for test-signed drivers

Installation of HLK client software automatically enables test signing mode in Windows. Tap-windows6 build system supports test-signing the driver automatically. You need to put the automatically generated test certificate to the Windows certificate store on the HLK clients. After that you can install the test-signed driver without signature errors.

Preparing HLK studio/controller

Loading compatibility playlists

Make sure to get the HLK Hardware Compatibility Playlists (on the main HLK download page), and apply the one for the correct context, e.g. HLK Version 1809 CompatPlaylist? x64 Server.xml. The playlist narrows down the list of tests to the set required to get an HLK certification, removes some extra stress/failure verification type tests designed to help find driver crashes. There’s a “Load Playlist” option in the tests panel in the HLK studio app that you can use.

Setting up OpenVPN for HLK tests

Overview

The "Run tests" in HLK fail consistently unless the tap-windows6 adapter has an IPv6 gateway address. This can be resolved by installing an OpenVPN server and joining all HLK clients connect to it. Launching OpenVPN client on the HLK client creates an instance of tap-windows6 adapter on the system with proper IPv6 gateway address.

Overview of the steps:

  • Install the latest OpenVPN 2.x on OpenVPN (Linux) server
  • Install the latest OpenVPN 2.x on the HLK clients
  • Install test-signed (to-be-HLK-tested) tap-windows6 driver on the HLK clients
  • Generate certificates and keys for OpenVPN with EasyRSA 3 and "openvpn --genkey"
  • Create and install configs for OpenVPN server and clients with embedded keys/certificates
  • Ensure that OpenVPN is enabled and running on server and clients
  • Verify OpenVPN connectivity

OpenVPN configuration files

NOTE: using p2p mode in OpenVPN would be simplest. The configuration files below do not use p2p, so please do not use them at this point.

OpenVPN server configuration is fairly simple:

# OpenVPN server test configuration for Windows Hardware Lab Kit test server
#
# We need this for HLK "Run tests" which ping6 the gateway of the interface
# bound to the driver being tested.
#
server 10.218.112.0 255.255.255.0
server-ipv6 2001:db8:6666::1/64
port 1194
proto udp
dev tun5
comp-lzo
persist-key
persist-tun
keepalive 10 120
verb 4
duplicate-cn
max-clients 15
status hlk-openvpn-status.log
<ca>
# CA certicate here
</ca>

<cert>
# Server certificate here
</cert>

<key>
# Server private key here
</key>

key-direction 0
<tls-auth>
# TLS auth key here
</tls-auth>

OpenVPN setup on HLK clients:

# OpenVPN client test configuration for Windows Hardware Lab Kit test clients
#
# We need this for HLK "Run tests" which ping6 the gateway of the interface
# bound to the driver being tested.
#

# Replace "x.x.x.x" with OpenVPN server's (public) IPv4 address 
remote x.x.x.x 1194
client
proto udp
dev tun
comp-lzo
persist-key
persist-tun
keepalive 10 120
verb 4
status hlk-openvpn-status.log

<ca>
# CA certificate here
</ca>

<cert>
# Client certificate here
</cert>

<key>
# Client private key here
</key>

key-direction 1
<tls-auth>
# TLS auth key here
</tls-auth>

Enabling OpenVPN on server and clients

Once OpenVPN server and clients are configured properly make sure that OpenVPN is running and automatically starts on boot. On OpenVPN server built on recent distros (e.g. Ubuntu 18.04) you'd do

$ systemctl enable openvpn-server@hlk
$ systemctl start openvpn-server@hlk

On OpenVPN clients you'd do this from an administrator Powershell session:

PS> Set-Service OpenVPNService -StartupType Automatic -Status Running

Peer to Peer OpenVPN setup for HLK

The following was tested with Windows 10 x64 and worked well.

What you need: 1) HLK Controller (Tested With: Windows Server 2012 R2 on VMWare and Installed HLK tools on that) 2) Two Test Machines that have similar performance characteristics (preferably exactly same builds) 3) A quality fast network switch. Slow switches can cause issues on some of the tests.

Setup Steps: 1) On both test machines install the OS you want to test on. This must match the HLK tools you installed on your controller. 2) On both test machines install OpenVPN. Only the service is needed.

In the C:\Program Files\OpenVPN\config directory on each machine add configurations for a peer to peer static key configuration by following the steps here: https://openvpn.net/community-resources/static-key-mini-howto/. However instead of dev tun use dev tap and make the needed changes to ifconfig. Examples below:

server.ovpn

dev tap
ifconfig 10.1.1.1 255.255.255.0
secret static.key

client.ovpn

dev tap
remote <IP FOR SERVER>
ifconfig 10.1.1.2 255.255.255.0
secret static.key

On each machine set the OpenVPN Legacy Service to Automatic.

Reboot and ping each host to make sure everything is working. You are now setup to run the WHQL tests. Some of them need babysitting and are listed below:

In general the rest of the tests passed without a problem. However, it was more successful doing groups at a time and making sure they passed.

Some HLK Errata/Random? Notes:

  • The controller seems to arbitrarily pick which machine is Support and which one is under Test. If it has trouble picking name one of the TAP adapters SupportDevice0. Picking the options in the UI didn't change the behavior. So for the tests you are baby sitting pay attention to which the server and and which one the client is. The server will run a server.htm in NDIS test so it will be obvious.
  • Reboots seems to happen randomly during test setup. This can be a nuisance if you are monitoring the services window or the network connections window. Helpful to make shortcuts to these so they can easily be opened.
  • In the HLK Studio. You will see a category called Product Types. This may not have anything listed. If this is the case you will do all this work and only get certified as an "Other Driver". Make sure it says LAN. I had to go back to the Selection Tab, right click on the TAP adapter and add the feature for Device.Network.LAN.PM (power management). This will add a few tests but all should pass.
  • You will have to rename your Ethernet adapters to MessageDevice?. Right click on the interface and select Rename. If you attach the kernel debugger you will need to do this after attaching the debugger as it places a new interface on top of the Ethernet interface.
  • End to End testing can take about 5 hours because of baby sitting. Pick a good book to read or have some other work handy to keep you from losing your mind :).

Not sure if the below had any effect but was changed when doing this test:

In OemVista?.inf.in: *PhysicalMediaType? = 0x0 ; NdisPhysicalMediumUnspecified?

This was done to be consistent with what was in constants.h, but it also seemed to get some tests passing.

Changing the TAP interface type to be a virtual adapter did not work in this test pass. It seemed to mess up the NDIS tests which looked for a device that advertised as physical to assign a SupportDevice?. Maybe this is something we can eventually work with Microsoft to fix.

Testing OpenVPN connectivity

Assuming the above OpenVPN config you can very correct OpenVPN / tap-windows6 operation easily:

  • Verify that the TAP adapter has an IPv6 address (e.g. using ipconfig)
  • Ensure that the HLK client can ping the following VPN server addresses:
    • 10.218.112.1
    • 2001:db8:6666::2

Configuring and running the mandatory HLK tests

Adding machines to pool

It seems required/useful to add HLK clients to the pool first, then the support machine. Otherwise when you select the driver in "Device manager" tab HLK will assume that the HLK support machine has the driver under test. This may create problems down the line.

Multi-machine tests

Setting up multi-machine tests is pretty straightforward to do. You just need to name some of the devices on the test client (that actually runs the tests) and support client (second machine that is connected “back to back” through the VPN). Details in LAN testing prerequisites.

Then when you go to schedule the multimachine test, you can change the “role” dropdown to the support machine, and select the support machine (should be there if you have two ready machines in the pool).

Some of the non-NDIS tests that require “a working network connection” can just be given a valid IPV6 address on the VPN network and they will be happy with that. Whenever you see the “WDTFREMOTESYSTEM” parameter when scheduling a test, set it to an IPv6 address the system can ping over the VPN link. This could be the ipv6 address of the “support” system, or it could be some other arbitrary system on the VPN. This might not be necessary if the device under test has a working IPv6 gateway address that points to, say, support machine's tap adapter interface. But wearing belt and suspenders is not a bad idea when running HLK tests.

NDISTest 6.5 - 2 machine - Linkcheck

This test tests plugging and unplugging the (virtual) ethernet cable.

For this test you have two options:

  • Start and stop the service when running this to make the link start and stop
    • Note that you can't do this from Powershell prompt of the DTMLLUAdminUser which you may end up being in. You can still use the graphical services management, or possibly use an elevated Powershell prompt.
  • Use Tapdiag to change link state on the test system

In either case follow the interactive messagebox prompts. You may need to detach and attach the "cable" twice.

NDISTest 6.0 - 2 Machine - 2c_Priority

Use Tapdiag to enable 802.1Q on both machines before running the NDIS QoS test (2c_Priority).

The process for using tapdiag is:

  1. tapdiag /enable # Sets a registry key that enables the .tapdiag endpoint in the driver
  2. Restart the TAP driver, reboot or disable/enable in device manager. May need to stop the openvpn service to avoid a reboot – Reconnect VPN
  3. tapdiag /link:[on|off] # Use to unplug/plug ethernet cable
  4. tapdiag /setq:[on|off|always] # to set the 802.1Q handling. Driver now disables 802.1Q by default.

Note that the tapdiag configuration is runtime only – if you reboot the test machine, you will lose the 802.1Q state.

NDISTest 6.5 - 2 Machine - AddressChange?

This will definitely fail on slower machines. It worked fine on 1 year old i7 systems.

NDISTest 6.5 - 2 Machine - E2EPerf

The "" test seems to fail if the test adapter and the support adapter have different link speeds. This is visible from the logs of "Run NDISTest Client (no verifier)":

ERROR: Support Adapter must be connected at a link speed greater than or equal to the Test Adapter

This is a problem if the support machine has an old tap-windows6 driver version that claims to be 100Mbps whereas the device under test ("DUT") is advertising itself as 1Gbps device. Resolve by installing the correct (1Gbps) tap-windows6 driver version to the support machine as well.

NDISTest 6.0 - 1 machine - 1c_FaultHandling

This test is known to fail in at least two ways. The more serious one is that it starts looking for non-existing files from the Controller SMB share at the "Copy downlevel NDISTest binaries". The error message is pretty clear:

Cause : Failed to Start the Task

Cause : Failed to Copy File : "\\controller.hlk.local\tests\AMD64\nethlk\ndistest\bin\ntndis62\ndprot62.sys"
  Dest : "C:\hlk\JobsWorkingDir\Tasks\WTTJobRun76257277-2193-E911-82AA-080027895339\ndistest\bin\ntndis62\ndprot62.sys"

Failure : Failed to Start the Task "Copy downlevel NDISTest binaries"

Cause : Cannot Find Pattern "\\controller.hlk.local\tests\AMD64\nethlk\ndistest\bin\ntndis62\ndprot62.sys"

Cause : Failed to Copy File : "\\controller.hlk.local\tests\AMD64\nethlk\ndistest\bin\ntndis61\ndprot61.sys"
  Dest : "C:\hlk\JobsWorkingDir\Tasks\WTTJobRun76257277-2193-E911-82AA-080027895339\ndistest\bin\ntndis61\ndprot61.sys"

Cause : Cannot Find Pattern "\\controller.hlk.local\tests\AMD64\nethlk\ndistest\bin\ntndis61\ndprot61.sys"

Cause : Failed to Copy File : "\\controller.hlk.local\tests\AMD64\nethlk\ndistest\bin\ntndis6\ndprot60.sys"
  Dest : "C:\hlk\JobsWorkingDir\Tasks\WTTJobRun76257277-2193-E911-82AA-080027895339\ndistest\bin\ntndis6\ndprot60.sys"

Cause : Cannot Find Pattern "\\controller.hlk.local\tests\AMD64\nethlk\ndistest\bin\ntndis6\ndprot60.sys"

Cause : Failed to Copy File : "\\controller.hlk.local\tests\AMD64\nethlk\ndistest\bin\ntndis51\ndprot51.sys"
  Dest : "C:\hlk\JobsWorkingDir\Tasks\WTTJobRun76257277-2193-E911-82AA-080027895339\ndistest\bin\ntndis51\ndprot51.sys"

Cause : Cannot Find Pattern "\\controller.hlk.local\tests\AMD64\nethlk\ndistest\bin\ntndis51\ndprot51.sys"

If you mount the share manually on the HLK client you'll notice that either the entire directories or individual .sys files are missing indeed:

PS> net use X: \\controller.hlk.local\tests
The command completed successfully.
PS> Get-Childitem x:\amd64\NetHlk\NDISTest\bin\ -filter "ntndis*"

    Directory: x:\amd64\NetHlk\NDISTest\bin


Mode                LastWriteTime         Length Name
----                -------------         ------ ----
d-----        5/30/2019   7:07 AM                ntndis51
d-----        5/30/2019   7:07 AM                ntndis630
d-----        5/30/2019   7:07 AM                ntndis650
d-----        5/30/2019   7:07 AM                ntndis660
d-----        5/30/2019   7:07 AM                ntndis680

Once you get past this failure, at the end of the test the Service might not re-establish a link. WHQL seems to get the OpenVPN service into a bad state, just restart the OpenVPN service and the test will pass. The test waits a few minutes for the link to come back up.

NDISTest 6.0 - 1 machine - 1c_Registry

This test may fail due to files missing from the HLK controller, in the exact same way as 1c_FaultHandling above.

NDISTest 6.0 - 2 machine - 2c_Mini6Stress

Sometimes this test will hit a breakpoint in the NDIS test code. The breakpoint seems harmless and complained about some packets not getting confirmed. If you don't connect a kernel debugger this will cause a BugCheck? and the test will fail. If this happens connect a kernel debugger and rerun the test. When the breakpoint is hit press (l) Always Ignore and the test will pass.

You may also get away with just rerunning the test until it passes.

TDI filters and LSPs are not allowed

This test may fail due to broken network drive mappings. The hints are available in "Infrastructure -> Execution Logs -> WttEa?.log:

2980 4496 2019:5:31 18:48:8:90 Error: 0x8205aaaf, Error 0x8205aaaf   winsockerror code of 11001
  File=sdktools\wtt\jobs\wtttransportproviders\wttcommtcpip\src\wttcommtcpip.cpp Line=656
  PersistManager:EA:JobCancel::token::57M710T->
--- snip --
2980 5488 2019:5:31 18:48:11:210 Error: 0x800704c3, Multiple connections to a server or shared
  resource by the same user, using more than one user name, are not allowed. Disconnect all
  previous connections to the server or shared resource and try again.
  CRunManager::GetLogLocation()::(null)::CAUSE:Error returned from EnableShareAccess to the root
  of "\\controller.hlk.local\HLKLogs\EaFolderAccessCheck\4825BE12-3B91-4414-B8E2-5AD703D69BB3"
  File=sdktools\wtt\jobs\runtime\wttexecutionagent\eamanager\runmanager\src\runmanager.cpp Line=2379    

If this happens you should see a network share that is unavailable:

PS C:\Users\Administrator> net use
New connections will be remembered.

Status       Local     Remote                                     Network
-------------------------------------------------------------------------------------------
Unavailable  R:        \\hlk-controller.openvpn.in\HLKInstall     Microsoft Windows Network

The command completed successfully.

To resolve, unmount the network drive:

PS C:\Users\Administrator> net use R: /DELETE                              
R: was deleted successfully.

Static Tools Logo test (optional)

The Static Tools Logo Test checks that the driver build is using the static driver code analysis tool (SDV). Even if that is the case it is necessary to copy some files so that it can read the logs and check. Otherwise the test will fail.

Various test issues

Packet transmission too slow

This can result in test errors like "Expected minimum of 237 packets but we received 200 packets". The test allows one second after all the sends have been completed for all the packets to be received.

Disabling verbose log printing in the server makes it more reliable.

Packets reordering

Packets can (rarely) be reordered in flight, which causes an assertion in the test driver. Hints are errors such as

  • "1 total breakpoints were hit in the protocol driver while this test was executing"
  • "Out of order indication"
  • "Dropped indications"

HLK doesn't complain if some packets are lost, instead these errors are raised when out of order packets are received.

The OpenVPN architecture has some inherent race conditions that can cause reordering of packets. This has happened 2 or 3 times over dozens of runs, and does cause a test failure, but a rerun should pass.

Address Change test failures

You may encounter failures in the "Address Change" test. It’s a combination of a few factors:

  • OpenVPN client sets the network link status around a second before the server actually starts forwarding any packets to the client. This is quite possibly an order of operations bug in openvpn.
  • That would be well and good, but the address change test has extremely aggressive timings
  • It starts sending packets almost immediately once the link status comes up, and stops listening for packets less than a second after that.

This failure seems to be related to newer OpenVPN versions. Logs for reference:

(Test system connecting after MAC address changes)

…

Wed Feb 20 22:03:43 2019 us=810407 vpnclient-nopass/192.168.1.36:1194 Data Channel MTU parms [ L:1581 D:1450 EF:49 EB:411 ET:32 EL:3 ]

Wed Feb 20 22:03:43 2019 us=810637 vpnclient-nopass/192.168.1.36:1194 Outgoing Data Channel: Cipher 'AES-256-GCM' initialized with 256 bit key

Wed Feb 20 22:03:43 2019 us=810765 vpnclient-nopass/192.168.1.36:1194 Incoming Data Channel: Cipher 'AES-256-GCM' initialized with 256 bit key

 
(Other system starts to send test packets)

WWWWRWed Feb 20 22:03:48 2019 us=680899 vpnclient-nopass/192.168.1.151:1194 MULTI: unknown unicast destination [00:ff:e9:88:37:ca], flood

wWRWed Feb 20 22:03:48 2019 us=681534 vpnclient-nopass/192.168.1.151:1194 MULTI: unknown unicast destination [00:ff:e9:88:37:ca], flood

wWRWed Feb 20 22:03:48 2019 us=681869 vpnclient-nopass/192.168.1.151:1194 MULTI: unknown unicast destination [00:ff:e9:88:37:ca], flood

…

wWRWed Feb 20 22:03:49 2019 us=158052 vpnclient-nopass/192.168.1.151:1194 MULTI: unknown unicast destination [00:ff:e9:88:37:ca], flood

(Other system is done sending packets)


(Openvpn server starts forwarding packets at around this point.)

wWRRRWed Feb 20 22:03:50 2019 us=472525 vpnclient-nopass/192.168.1.36:1194 MULTI: Learn: 02:02:04:06:08:08 -> vpnclient-nopass/192.168.1.36:1194
…

Another side note – The “unicast destination [00:ff:e9:88:37:ca]” in those messages is incorrect – those packets are actually directed to the “02:02:04:06:08:08” address. The flood message in the patch is showing the source address.

HLK client version mismatches

The HLK client version (e.g. Windows 1809) needs to match the HLK version (HLK 1809). If there is a mismatch there can be some setup issues (some semantics of driver verifier configuration changed, or some of the test components might have failed on the OS version).

HLK logging

HLK logs problems mostly to event logs, so when tests are failing in an unusual fashion you can possibly find out why in the event viewer.

Debugging

It is possible to get HLK into strange states. For example:

  • HLK clients / support machines do not get into "Ready" state
  • Launching two-machine tests fail with duplicate database key errors

Some things you can try to get past these errors:

  • Restart the "hlksvc" Windows service
  • Reboot HLK nodes (controller, clients, support machine)
  • Reinstall HLK client software and reboot HLK client
  • Create a new HLK project
  • Create a new pool and move HLK client machines there
  • Delete HLK clients from the pool

The are known to work sometimes, but so far I have not been able to establish any logic here.

The HLK tests fortunately provide tons of logging. Check sections "Error", "Task log", "Infrastructure" in the context menu for the test in question to see what you can find.

Addendum

Firewall rules for HLK server and clients

Installing HLK software automatically opens ports in the Windows firewall for HLK traffic. In case HLK controller and HLK clients are not in the same switch some firewall (e.g. EC2 security group rules) might block HLK traffic. Here is a reference for the ports which need to be open for HLK tests to work:

  • OpenVPN peer (udp/1194) <-> OpenVPN peer (udp/1194)
  • HLK clients -> HLK controller tcp/1771 (HLK Server Receiver Port)
  • HLK clients -> HLK controller tcp/1782 (HLKSvc Receiver Port)
  • HLK clients -> HLK controller tcp/445 (HLKInstall Samba share)
  • HLK controller -> HLK clients tcp/1771 (HLK Server Receiver Port)

Outbound traffic is assumed to be unrestricted. If not, adjust egress rules accordingly. Also note that IPv6 traffic needs to flow properly in the OpenVPN virtual network as HLK tests require IPv6.

External links