Hyper-V in Windows Server 2016 TP5: What's New and Cool
Better backups, checkpoints and Shielded VMs are highlights of the latest version.
With the Technical Preview (TP) 5 of Windows Server 2016 recently released, I thought I'd take the opportunity to cover all the new goodness coming in Hyper-V. TP 5 is apparently feature complete; it's also the last preview before Release to Manufacturing (RTM), expected sometime in the second half of 2016.
Some of the new features were covered in my interview with Ben Armstrong, principal program manager on the Hyper-V team, published in six parts here on virtualizationreview.com. Part 1 is here; it contains links to all the other parts.
In the past, Microsoft would develop Windows Server in the "back room," with a public beta late in the development cycle, more for fixing bugs than as a source of ideas for improvements or new features. This version of Windows, on the other hand, has been "in the open" since TP 1 back in October 2014, all the way through to TP 5, released April 27. The TP 5 desktop is shown in Figure 1.
Given the long development time, expectations are high. Fortunately, this version doesn't disappoint, with many new features, including:
- A new type of checkpoints
- A new backup platform
- Rolling cluster upgrades
- VM compute resiliency
- Storage QoS
- Storage Spaces Direct
- Shielded VMs
- Windows and Hyper-V containers
- Nano server and PowerShell Direct
I'll go over each of these in brief in this article. There are also other improvements to existing features such as Shared VHDX, Hyper-V Replica, more online operations for VMs, a better Hyper-V manager console and more.
Backup and Checkpoints
Backups in Hyper-V can sometimes be a bit shaky, due to a reliance on the underlying Volume Shadow Copy Services (VSS) system. Windows Server 2016 instead makes change tracking a feature of Hyper-V itself, making it much easier for third-party backup vendors to support Hyper-V.
Snapshots and checkpoints are dangerous for production workloads. They have a convenient workflow: take a snapshot; make some changes in the virtual machine (VM); if those changes turn out badly, simply roll back to the snapshot.
The problem is that if it happens on a Domain Controller (DC) or database server that's replicating with other servers, it's now out of sync; there's no easy way to tell, nor any easy way to fix it. Microsoft made changes in 2012 for Active Directory (AD) DCs to make them safer, but this still doesn't cover any other workloads in danger from a wrongly-applied snapshot.
Production checkpoints in Windows Server 2016 (the classic checkpoints can still be used) uses VSS inside the VM; when you apply them, the VM will assume it's been restored from a backup and reboot, rather than be restored to a running state. This eliminates the danger while retaining the convenience of snapshots.
Rolling Cluster Upgrades
The upgrade story from Windows Server 2012 to 2012 R2 was pretty good, enabling live migration of VMs from the old to the new. But you still had to stand up a separate Windows Server 2012 R2 cluster to start the process, which wasn't ideal.
Going from Windows Server 2012 R2 to Windows Server 2016 is a lot easier: simply evict one cluster node, format and install 2016, and add it back into the cluster. It now acts as a 2012 R2 host, so VMs can be Live Migrated to it; that means you can take another host and clean install it. Rinse and repeat as many times as required. When all nodes are upgraded and you're sure you're not going to add any down-level nodes, you use PowerShell to upgrade the cluster functional level, similar to the way you do AD upgrades.
VMs have had version numbers internally since the first version of Hyper-V. Because of the rolling cluster upgrade scenario, they're now visible, so you need to be able to upgrade the configuration files for each VM. This is also done using PowerShell. Once upgraded, a VM can only run on Windows Server 2016 hosts. Each VM uses the new .vmcx file format for configuration and .vmrs for runtime state data; both are binary files and do not support direct editing (unlike the current XML file type).
VM Compute Resiliency
Clustering hosts together provides continuous VM availability for planned downtime; simply Live Migrate VMs from the host first, then perform the maintenance. For unplanned downtime, VMs on a failed host are automatically restarted on another host in the cluster, providing for high availability with a few minutes downtime for the restart. So far, so good.
There are times, however, when host clusters can cause issues by themselves. A short network outage between the hosts can cause them to initiate a failover of many VMs, when, in fact, the network could right itself after a few seconds. Such a failover could cause more downtime, with numerous VMs restarting simultaneously.
In Windows Server 2016, if a host loses connectivity to the cluster, the VMs will keep running for four minutes (this can be changed) in "isolated mode." If it's longer, normal failover will occur. If a host has numerous disconnections over a 24-hour period, it will be quarantined and its VMs Live Migrated off as soon as possible.
Today, if a VM has an outage to the shared storage where the virtual disks are housed, it'll crash if the outage is longer than about a minute. In Windows Server 2016, if storage connectivity is lost, the VM will be paused, pending reconnection to its virtual disks, avoiding the likely data loss in a crash.
You can now specify priority for VMs -- high, medium and low -- when failover occurs. TP5 allows admins to create sets of VMs, define dependencies between them, and let this dictate the order in which VMs are started.
In the current version of Hyper-V, you can set a min or a max (or both) value for IOPS for virtual hard disks. This works fine as long as the backend storage can actually deliver the combined IOPS requirement for all running VMs; if it can't, there's no way for the individual hosts to manage IOPS requirements.
Windows Server 2016 brings a centralized storage IOPS "cop," sitting on the Scale Out File Server (SOFS) nodes. It's managed either through PowerShell or with Virtual Machine Manager (VMM), and provides a way to create policies that can be applied in aggregate across VMs or to individual VMs. It also monitors the IOPS actually used by each VM, giving you a more comprehensive view of the way your applications use storage.
Storage Spaces Direct
Microsoft's implementation of Software Defined Storage (SDS) took shape in Windows Server 2012. SOFS nodes act as the front end of a SAN (but simpler to set up and manage); SAS JBOD (just a bunch of disks) disk trays with HDDs and SSDs provide the data storage.
In Windows Server 2016, Microsoft takes the next logical step by offering Storage Spaces Direct (S2D), which provides pooling of local storage (SAS, SATA and NVMe HDDs and SSDs) in each host, and offers it up as VM storage. This can either be disaggregated with storage nodes in one cluster and Hyper-V nodes in another, or hyper-converged where each host is both a storage node and VM host.
New in TP 5 is the ability to have fewer than four nodes, along with support for SATA disks (previous previews required SATA disks to be connected through a SAS adapter).
One basic problem with any hypervisor is that host and/or fabric administrators have to be as trusted as the highest level administrators in an organization. If VMs are hosted elsewhere, in a public cloud, for example, you have to have a lot of trust; a rogue fabric administrator can inspect the memory of a running VM, take an offline copy of the virtual disks, mount these and steal secrets such as passwords or perhaps mount an offline attack against an AD database.
There are a few building blocks for Shielded VMs: generation 2 VMs now come with a virtual TPM chip, which enables Bitlocker for Windows VMs, and dm-crypt on Linux VMs for the virtual hard disks. Generation 2 VMs also provide Secure Boot for both Windows and Linux VMs, as they start from a virtual UEFI.
On separate physical servers that are part of an isolated administrative forest, there's a Host Guardian Service (Figure 2) which attests to the health of Hyper-V hosts. There are two models for this: Administrator Attestation and Hardware Trusted Attestation. The first relies on trusted hosts being in a particular AD group; the second one uses new TPM version 2 chips in each host to protect the hypervisor from tampering.
In TP5, a new mode called Encryption Supported supports vTPM, disk encryption and Live Migration encryption; but it still provides less assurance than a true Shielded VM. In TP5, you can also convert normal generation 2 VMs to Shielded VMs, while a new recovery environment allows for troubleshooting of a Shielded VM.
The end result of a Shielded VM is that the fabric administrators have no access to the VM. They can turn it off, but they can't access its memory or connect to it with VM Connect; if they copy the virtual hard disks, they can't access them because they're encrypted.
Windows and Hyper-V Containers
Containers are all the rage in the IT press, and although I think it's going to take a lot longer than the pundits believe before we're all "containerized," Microsoft is now in the running with two flavors of containers. Each container can either run Nano server or Server Core (not the GUI version); for developers, the flavors are identical.
The difference comes in the deployment phase: in your own datacenter where you (probably) trust the code running in each container, you can rely on the weaker isolation of a Windows Container. But if you're deploying your code in a public cloud or at a hosting provider, the Hyper-V container gives you the same isolation the hypervisor provides (see Figure 3).
The biggest change in Windows Server since NT was conceived is undoubtedly Nano server. It's a minimal disk footprint, low resource, GUI-less, no local logon server where you add only the functionality needed. The only roles supported today are Hyper-V host server, SOFS server, and as an application platform for modern applications. The benefits here are very small attack surface, low overhead and less frequent reboots due to fewer patches.
PowerShell Direct is a very useful feature in Windows 10 and Windows Server 2016. If you have the credentials, you can run cmdlets inside one or more VMs from the host without having to set up PowerShell remoting first.
New in TP 5 is the ability to run PowerShell directly on a Nano server, along with cmdlets for working with local users and groups.
There are quite a few new features in TP 5 coming to both Windows 10 and Windows Server 2016. Even with the Hyper-V role installed, you can now use Connected Standby power state.
The ability to connect a VM directly to a PCIe hardware device is interesting. It doesn't work for every device; more information is available here. If you want to try it out, see these instructions. At this stage, the main aim is connecting VMs directly to NVMe superfast storage, but GPU support is also coming.
Host resource protection is a feature Microsoft added in response to Azure vulnerabilities. In these cases, VMs with hostile code would try various attack methods to starve the host of resources. Host resource protection detects this and limits resources available to the VM.
Hyper-V Manager now lets you enter alternate credentials when connecting to remote Hyper-V hosts, and also save these credentials. The manager can also manage both Windows 10 and Server 2016 as well as Windows 8 and 8.1, and Windows Server 2012 and 2012 R2 hosts. The console is operating using the WS-MAN protocol over port 80. WS-MAN also makes it easier to enable a host for remote management.
Hyper-V Replica now supports shielded VMs, provided the destination replica server is authorized to run the replicated VM(s). To support containers, Hyper-V now supports nested virtualization, with a VM being a Hyper-V host with other VMs running inside it; several levels of this nesting is possible.
The new version of Hyper-V brings many unique features as well as important improvements to existing ones. Make sure to check out TP 5 yourself.