Hyper-V architecture

This is the second post in a series on Hyper-V performance. The series begins here.

Hyper-V installs a hypervisor that gains control of the hardware immediately following a boot, and works in tandem with additional components that are installed in the Root partition. Hyper-V even installs a few components into Windows guest machines (e.g., synthetic device drivers, enlightenments). This section looks at how all these architectural components work together to provide a complete set of virtualization services to guest machines.

Hyper-V uses a compact hypervisor component that installs at boot time. There is one version for Intel machines, Hvix64.exe, and a separate version for the slightly different AMD hardware, Hvax64.exe. The hypervisor is responsible for scheduling the processor hardware and apportioning physical memory to the various guest machines. It also provides a centralized timekeeping facility so that all guest machines can access a common clock. Except for the specific enlightenments built into Windows guest machines, the hypervisor functions in a manner that is transparent to a guest machine, which allows Hyper-V to host guest machines running an OS other than Windows.

After the hypervisor initializes, the Root partition is booted. The Root partition performs virtualization functions that do not require the higher privilege level associated with the hardware virtualization interface. The Root partition owns the file system, for instance, which the hypervisor has no knowledge of, let alone a way to access it, lacking all the system software to perform file IO. The Root creates and runs a VM worker process (vmwp.exe) for each guest machine. Each VM worker process keeps track of the current state of a child partition. The Root partition is also responsible for the operation of native IO devices, which includes disks and network interfaces, along with the device driver software that is used to access those hardware peripherals. The Windows OS running in the Root partition also continues to handle some of the machine management functions, like power management.

The distribution of machine management functions across the Hyper-V hypervisor and its Root partition makes for an interesting, hybrid architecture. A key advantage of this approach from a software development perspective is Microsoft already had a server OS and saw no reason to reinvent the wheel when it developed Hyper-V. Because the virtualization Host uses the device driver software installed on the Windows OS running inside the Root partition to connect to peripherals devices for disk access and network communication, Hyper-V immediately gained support for an extensive set of devices without ever needing to coax 3rd party developers to support the platform. Hyper-V also benefits from all the other tools and utilities that run on Windows Server: PowerShell, network monitoring, and the complete suite of Administrative tools. The Root partition uses Hypercalls to pull Hyper-V events and logs them to its Event log.

Hyper-V performance monitoring also relies on cooperation between the hypervisor and the Root partition to work. The hypervisor maintains several sets of performance counters that keep track of guest machine dispatching, virtual processor scheduling, machine memory management, and other hypervisor facilities. A performance monitor running inside the Root partition can pull these Hyper-V performance counters by accessing a Hyper-V perflib that gathers counter values from the hypervisor using Hypercalls.

In Hyper-V, the Root Partition is used to administer the Hyper-V environment, which includes defining the guest machines, as well as starting and stopping them. Microsoft recommends that you dedicate the use of the Hyper-V Root partition to Hyper-V administration, and normally you would install the Windows Server Core to run Hyper-V. Keeping the OS footprint small is especially important if you ever need to patch the Hyper-V Root and re-boot it, an operation that would also force you either to shut down and restart all the resident guest machines, or migrate them, which is the preferred way to do reconfiguration and maintenance. While there are no actual restrictions that prevent you from running any other Windows software that you would like inside the Root partition, it is not a good practice. In Hyper-V, the Root partition provides essential runtime services for the guest machines, and running any other applications on the Root could conflict with the timely delivery of those services.

The Root partition is also known as the parent partition, because it serves as the administrative parent of any child guest machines that you define and run on the virtualization Host machine. A Hyper-V Host machine with a standard Windows Server OS Root partition and with a single Windows guest machine running in a child partition, along with the major Hyper-V components installed in each partition, is depicted in Figure 1.

HyperV architecture

Figure 1. Major components of the Hyper-V architecture.

Figure 1 shows the Hyper-V hypervisor installed and running at the very lowest (and highest priority) level of the machine. Its role is limited to the management of the guest machines, so, by design, it has a very limited range of functions and capabilities. It serves as the Scheduler for managing the machine’s logical processors, which includes dispatching virtual machines to execute on what are called virtual processors. It also manages the creation and deletion of the partitions where guest machines execute. The hypervisor also controls and manages machine memory (RAM). The Hypercall interface is used for communication with the Root partition and any active child partitions.

Processor scheduling

Virtual machines are scheduled to execute on virtual processors, which is the scheduling mechanism used by the hypervisor to run the guest machines while maintaining the integrity of the hosting environment. Each child partition can be assigned one or more virtual processors, up to the maximum number of logical processors (i.e., physical processors with Intel’s HyperThreading enabled), present in the hardware. Tallying up all the guest machines, more virtual processors are typically defined than the number of logical processors that are physically available in the hardware, so the hypervisor maintains a Ready Queue of dispatchable guest machine virtual processors analogous to the Ready Queue maintained by the OS for threads that are ready to execute. If the guest OS running in the child partition supports the feature, Hyper-V can even add virtual processors to the child partition while the guest machine is running. (The Windows Server OS supports adding processors to the machine dynamically without having to reboot.)

By default, virtual processors are scheduled to run on actual logical processors using a round robin policy that balances the CPU load across virtual machines evenly. Processor scheduling decisions made by the hypervisor are visible to the Root partition via a hypervisor call to allow the Root to keep current the state machines it maintains representing each individual guest.

Similar to the Windows OS Scheduler, the hypervisor Scheduler implements soft processor affinity, where an attempt is made to schedule a virtual processor on the same logical processor where it executed last. The hypervisor’s guest machine scheduling algorithm is also aware of the NUMA topology of the underlying hardware, and attempts to assign all the virtual processors for a guest machine to execute on the same NUMA node, where possible. NUMA considerations loom large in virtualization technology because most data center hardware has NUMA performance characteristics. If the underlying server hardware has NUMA performance characteristics, best practice is to (1) assign guest machines no more virtual processors than the number of logical processors that are available on a single NUMA node, and then (2) rely on the NUMA support in Hyper-V to keep that machine confined to one NUMA node at a time.

Because the underlying processor hardware from Intel and AMD supports various sets of specialized, extended instructions, virtual processors can be normalized in Hyper-V, a setting which allows for live migration across different Hyper-V server machines. Intel and AMD hardware are different enough, in fact, that you cannot live migrate guest machines running on one manufacturer’s equipment to the other manufacturer’s equipment. (See Ben Armstrong’s virtualization blog for details.) Normalization occurs when you select the guest machine’s compatibility mode setting on the virtual processor configuration panel. The effect of this normalization is that Hyper-V will present a virtual processor to the guest that might exclude some of the hardware-specific instruction sets that might exist on the actual machine. If it turns out that the guest OS specifically relies on some of these hardware extensions – many have to do with improving the execution of super-scalar computation, audio and video streaming, or 3D rendering – then you are presented with an interesting choice to make between higher availability and higher performance.

Processor performance monitoring

When Hyper-V is running and controlling the use of the machine CPU resources, to monitor processor utilization by the hypervisor, the Root partition, and guest machines running in child partitions, you must access the counters associated with the Hyper-V Hypervisor. These can be gathered by running a performance monitoring application on the Root partition. The Hyper-V processor usage statistics need to be used instead of the usual Processor and Processor Information performance counters available at the OS level.

There are three sets of Hyper-V Hypervisor processor utilization counters and the key to using them properly is to understand what entity each instance of the counter set represents. The three Hyper-V processor performance objects include the following:

  • Hyper-V Hypervisor Logical Processor

There is one instance of HVH Logical Processor counters available for each hardware logical processor that is present on the machine. The instances are identified as VP 0, VP 1, …, VP n-1, where n is the number of Logical Processors available at the hardware level. The counter set is similar to the Processor Information set available at the OS level (which also contains an instance for each Logical Processor). The main metrics of interest are % Total Run Time and % Guest Run Time. In addition, the % Hypervisor Run Time counter records the amount of CPU time, per Logical processor, consumed by the hypervisor.

  • Hyper-V Hypervisor Virtual Processor

There is one instance of the HVH Virtual Processor counter set for each child partition Virtual Processor that is configured. The guest machine Virtual Processor is the abstraction used in Hyper-V dispatching. The Virtual Processor instances are identified using the format guestname: Hv VP 0, guestname: Hv VP 1, etc., up to the number of Virtual Processors defined for each partition. The % Total Run Time and the % Guest Run Time counters are the most important measurements available at the guest machine Virtual Processor level.

CPU Wait Time Per Dispatch is another potentially useful measurement indicating the amount of time a guest machine Virtual Processor is delayed the hypervisor Dispatching queue, which comparable to the OS Scheduler’s Ready Queue. Unfortunately, it is not clear how to interpret this measurement. Not only are the units are undefined – although 100 nanosecond timer units are plausible – but the counter reports values that are inexplicably discrete.

The counter set also includes a large number of counters that reflect the guest machine’s use of various Hyper-V virtualization services, including the rate that various intercepts, interrupts and Hypercalls are being processed for the guest machine. (These virtualization services are discussed in more detail in the next section.) These are extremely interesting counters, but be warned they are often of little use in diagnosing the bulk of capacity-related performance problems where either (1) a guest machine is under-provisioned with respect to access to the machine’s Logical Processors or (2) the Hyper-V Host processor capacity is severely over-committed. This set of counters is useful in the context of understanding what Hyper-V services the guest machine consumes, especially if over-use of these services leads to degraded performance. Another context where they are useful is understanding the virtualization impact on a guest machine that is not running any OS enlightenments.

  • Hyper-V Hypervisor Root Virtual Processor

The HVH Root Virtual Processor counter set is identical to the metrics reported at the guest machine Virtual Processor level. Hyper-V automatically configures a Virtual Processor instance for each Logical Processor for use by the Root partition. The instances of these counters are identified as Root VP 0, Root VP 1, etc., up to VP n-1.

Table 1.

Hyper-V Processor usage measurements are organized into distinct counter sets for child partitions, the Root partition, and the hypervisor.

Counter Group

# of instances

Key counters

HVH Logical Processor

# of hardware logical processors
VP 0, VP 1, etc. % Total Run Time
% Guest Run Time

HVH Virtual Processor

# of guest machines * virtual CPUs

 

  guestname: Hv VP 0, etc. % Total Run Time
  % Guest Run Time
  CPU Wait Time Per Dispatch
  Hypercalls/sec
  Total Intercepts/sec
  Pending Interrupts/sec

HVH Virtual Processor

# of hardware logical processors
  Root VP 0, Root VP 1, etc. % Total Run Time

Scheduling Priority.

The Hyper-V hypervisor, which is responsible for guest machine scheduling, by default implements a simple, round-robin policy in which any virtual processor in the execution state is equally likely to be dispatched. If the CPU capacity of the hardware is adequate, defining and running more virtual processors than logical processors leads to the possibility of dispatching queuing delays, but normally these dispatching delays are minimal because many virtual processors are in the idle state, and so over-commitment of the physical processors often does not impact performance greatly. On the other hand, once these dispatching delays start to become significant, the priority scheduling options that Hyper-V provides can become very useful.

In addition to the basic round-robin scheme, the hypervisor also factors processor affinity and the NUMA topology into scheduling decisions, considerations that add asymmetric constraints to processor scheduling. Note that the guest machine’s current CPU requirements are not directly visible to the Hyper-V hypervisor, so its scheduling function does not know when there is big backlog of ready threads delayed inside the guest. With the single exception of an OS enlightenment in Windows guests to notify Hyper-V via a Hypercall that it is entering a long spinlock, Hyper-V has no knowledge of guest machine behavior to make better informed scheduling decisions. Nor does the fact that a virtual processor has one or more interrupts pending bias a guest machine that is currently idle and improve its position in the dispatching queue. (In other words, Hyper-V’s scheduling algorithm does not attempt to implement any form of the Mean Time to Wait (MTTW) algorithm that improves throughput in the general case by favoring the dispatching of guest machines that spend proportionally more time waiting on service from external devices. In contrast, the Windows OS Scheduler does implement a form of MTTW thread scheduling by temporarily boosting the priority of a thread that is awakened when a device interrupt it was waiting for occurred.) These are all factors that suggest it is not a good idea to try and push the CPU load to the limits of processor capacity under Hyper-V.

When enough guest machine virtual processors attempt to execute and the physical machine’s logical processors are sufficiently busy, performance under Hyper-V will begin to degrade. You can then influence Hyper-V guest machine scheduling using one of the available priority settings. The CPU priority settings are illustrated in Figure 2, which is a screenshot showing the Hyper-V Processor Resource control configuration options that are available. Hyper-V provides several tuning knobs to customize the Hyper-V virtual processor scheduler function. These include:

  • reserving CPU capacity in advance on behalf of a guest machine,
  • setting an upper limit to the CPU capacity a guest machine can use, and
  • setting a relative priority (or weight) for this virtual machine to be  applied whenever there is contention for the machine’s processors

Let’s compare the processor reservation, capping and weighting options and discuss which make sense to use and when to use them.

HyperV logical processor resource scheduling screenshot

Figure 2. Processor resource control setting include reservation, upper limits on CPU consumption, and a prioritized guest machine dispatching based on the relative weights of the VMs.

 Reservations.

The Hyper-V processor reservation setting is specified as the percentage of the capacity of a logical processor to be made available to the guest machine whenever it is scheduled to run. The reservation setting applies to each virtual processor that is configured for the guest. Setting a processor reservation value is mainly useful if you know that a guest machine requires a certain, minimal amount of processing power and you don’t want Hyper-V to schedule that VM to run unless that minimum amount of CPU capacity is available. When the virtual processor is dispatched, reservations guarantee that it will receive the minimum level of service specified.

Hyper-V reservation work differently from many other Quality of Service (QoS) implementations that feature them where any capacity that is reserved, but not used, remains idle. In Hyper-V, when the guest machine does not consume all the capacity that it is reserved for it, virtual processors from other guest machines that are waiting can be scheduled instead. Implementing reservations in this manner suggests the hypervisor makes processor scheduler decisions on a periodic basis, functionally equivalent to a time-slice, but whatever that time-slicing interval is, it is undocumented.

Capping.

Setting an upper limit using the capping option is mainly useful when you have a potentially unstable workload, for example, a volatile test machine, and you don’t want to allow that rogue guest to dominate the machine to the detriment of every other VM that is also trying to execute. Similar to reservations, capping is also specified as a percentage of logical processor capacity.

Weights.

Finally, setting a relative weight for the guest machine’s virtual processors is useful if you know the priority of a guest machine’s workload relative to the other guest machine workloads that are resident on the Hyper-V host. This is the most frequently used virtual processor priority setting. Unlike reservations, weights do not provide any guarantee that a guest machine’s virtual processor will be dispatched. Instead they are simply used to increase or decrease the likelihood that a guest machine’s virtual processor is dispatched. As implemented, weights are similar to reservations, except that it is easier to get confused with weights about what percentage of a logical processor you intend to allocate.

Here is how the weights work. Each virtual processor that a guest machine is eligible to use gets a default weight of 100. If you don’t adjust any of the weights, each virtual processor is equally as likely to be dispatched as any other, so the default scheme is the balanced, round-robin approach. In default mode (i.e., round-robin), the probability that a virtual processor is selected for execution is precisely

1/eligible guest machines

Notice that the number of virtual processors defined per machine makes the guest eligible to run more virtual processors, but is otherwise not a factor in calculating the dispatching probability for any one of its virtual processors.

When you begin to adjust the weights of guest machines, scheduling decisions become based on the relative weighting factor you have chosen. To calculate the probability that a virtual processor is selected for execution when the weighting factors are non-uniform, calculate a base value that is the total for all the guest machine weighting factors. For instance, if you have three guest machines with weighting factors of 100, 150 and 250, the base value is 500, and the individual guest machine dispatching probabilities are calculated using the simple formula

weight(i) / SUM(weight(1):weight(n))

So, for that example, the individual dispatching probabilities are 20%, 30% and 50%, respectively.

Relative weights make it easier to mix production and test workloads on the same Hyper-V host. You could boost the relative weight of production guests to 200, for example, while lowering the weight of the guest machines used for testing to 50. If you have a production SQL Server guest machine that services the production servers, you would then want to set its weight higher than the other production guests. And, if you had a SQL Server guest test machine that services the other test guests, you could leave that machine running at the default weight of 100, higher than the other test machines, but still lower than any of the production guest machines.

CPU Weight example.

Consider a Hyper-V Host with four guest machines, two production guests, while the remaining two guest are test machines running a discretionary workload. To simplify matters, let’s assume each guest machine is configured to run two virtual processors. The guest machine virtual processors weights are configured as shown in Table 2:

Table 2.

Guest Machine Workload VP Weight
Production 200
Test 50

Since Hyper-V guest machine scheduling decisions are based on relative weight, you need to sum the weights over all the guest machines and compute the total weighting factor per logical processor. Then Hyper-V calculates the relative weight per logical processor for each guest machine and attempts to allocate logical processor capacity to each guest machine proportionately.

Table 3.

Calculating the guest machine relative weights per logical processor.

Workload

# guests VP Weight Total Weight

(Weight * guests)

Guest Relative Weight

per LP

Production

2

200

400

80%

Test

2

50

100

20%

Totals

4

500

Under those conditions, assuming 100% CPU utilization and an equivalent CPU load from each guest, you can expect to see the two production machines consuming about 80% of each logical processor, leaving about 20% of a logical processor for the test machines to share. In a later post, we will look at benchmark results using this CPU weighting scenario.

If you are mixing production and test machines on the same Hyper-V host, you may want to also consider an extra bit of insurance and use capping to set a hard upper limit on the amount of CPU capacity that the test machines can ever use. Keep in mind that none of these resource controls has much effect unless there is ample contention for the processors, which is something to be avoided in general under any form of virtualization. The Hyper-V processor resource controls come in handy when you pursue an aggressive server consolidation program, and want to add a little extra margin of safety to the effort.

Reservations and limits are commonly used in Quality of Service mechanisms, and weights are often used in task scheduling. However, it is a little unusual to see a scheduler that implements all three tuning knobs. System administrators can use a mix of all three settings for the guest machines, which is confusing enough. And, for maximum confusion, you can place settings on different machines that are basically incompatible with the settings of the other resident guest machines. So, it is wise to exercise caution when using these controls. Later, in this chapter I will report on some benchmarks that I ran where I tried some of these resource control settings and analyzed the results. When the Hyper-V Host machine is running at or near its CPU capacity constraints, these guest machine dispatching priority settings do have a significant impact on performance.

 

The next post in this series continues this Hyper-V architecture discussion, describing its use of intercepts, interrupts and Hypercalls, the three interfaces that allow for interaction between the hypervisor, the root partition, and the guest partitions.

 .

Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *