Hyper-V Performance: Understanding guest machine performance, Part III

In this post, the baseline measurements discussed in the previous post are compared to results for an under-provisioned guest machine in order to characterize the performance delays guest machines encounter when workloads execute under virtualization. This post also reports benchmark results reflecting both an efficiently provisioned Hyper-V Host and an over-committed one.

I simulated an under-provisioned guest machine by executing the same benchmark on a guest Windows VM that had access to only two of the 4 available physical processors. Configured to use only two virtual processors, the benchmark program required 147 minutes to run to completion, compared to 105 minutes on the 4-way guest machine.

Obviously, in this scenario the performance of the benchmark workload being executed on the 2-way guest machine suffered because it did not have access to an adequate number of virtual processors. It is easy to see that this guest machine is under-provisioned in this example where the conditions are tightly controlled. The key is being able to recognize when guest machines that are executing an unknown workload are under-provisioned. Look for the combination of the following:

  1. Each of the Hyper-V virtual processors allotted to the child partition show % Run Time processor utilization measurements approaching 100% busy, and
  2. Internal guest machine System\Processor Queue Length measurements exceed 3X the number of virtual processors that are configured.

Together, these internal measurements are reliable indicators that the guest machine CPU workload is constrained by access to too few virtual CPUs.

Efficiently-provisioned Hyper-V Host and its Guests

When the Hyper-V Host machine is efficiently provisioned, application responsiveness is still affected, but it becomes possible to scale up and scale out an application. By running the same benchmark program simultaneously on 2 2-way guest machines, I was able to generate a simple example of this scaling out behavior. When run concurrently in separate two-processor virtual machines, each individual benchmark ran to completion in about 178 minutes, an execution time stretch factor of almost 2, compared to the native execution baseline. But, interestingly, the overall throughput of the guest machines doubled, since two full complements of tasks ran to completion during that time period.

Over-committed Hyper-V Host

Having established that the benchmark workload will absorb all the CPU capacity that is available on the Hyper-V Host, it is easy to move from efficiently provisioned to an under-provisioned Host machine. This was accomplished by doubling the number of guest machines that are executing concurrently, compared to the previous benchmarking configuration. With four 2-way guest machines executing concurrently, the Hyper-V Host is thoroughly out of CPU capacity. Yet, Hyper-V still continues to execute the guest machine workloads efficiently. The execution time of a single benchmark job increases to 370 minutes, a stretch factor of almost 4.1 times slower than the native machine baseline. Throughput also increases proportionately – four times as many tasks were completed during that longer period.

The symptoms that the Hyper-V Host machine is out of CPU capacity are easy to spot. Figure 31 reports that each of the four guest machines consumes close to 100% of one of the available physical CPUs. Hyper-V utilization continues to hold steady at approximately 6% busy. There is no excess processor capacity.

overcommitted scenario Host machine processor utilization

Figure 31. Guest machines consume all available processor cycles when four 2-way guest machines were configured to run concurrently. Hypervisor CPU utilization continued to hold steady at around 6%.

If the physical CPUs are overloaded, you can then drill into the CPU usage by each of the virtual machines. Figure 32 shows the processor utilization distributed evenly across all the child partition virtual processors, which are weighted evenly in this example.

 

overcommitted scenario guest machine virtual processor utilization

Figure 32. Guest machine CPU usage is tracked by virtual processor. Here virtual processor usage is distributed evenly across all the child partition, which are weighted evenly in this example.

The results from timing the six benchmark runs that were discussed in this post and the previous post are summarized in Table 3, which also shows the virtualization “stretch factor” calculated from the ratio of the elapsed execution time of the guest machine configuration compared to native Windows performance.

Configuration
# of
guest 
machines
CPUs per guest machine

elapsed time (minutes)

stretch factor Thruput Hyper-V

% Run Time

Native machine

1

4

90

1

Root Partition

1

4

100

1.11

1

6%

Guest machine

1

4

105

1.17

1

8%

Under-provisioned Guest machine

1

2

147

1.63

1

4%

2 Guest machines

2

2

178

1.98

2

6%

4 Guest machines

4

2

370

4.08

4

6%

TABLE 3.

Benchmarking the performance of Guest machines in various configurations compared to running the same benchmark application standalone on native hardware.

Discussion.

Summarizing this set of benchmark results, we can see that it is reasonable to expect any timing test to execute about 15% longer when it is running on an adequately provisioned virtual machine, compared to running on native hardware. Meanwhile, while I only provided a single, simple example, it is readily apparent that an under-provisioned guest machine pays a substantial performance penalty when its configuration settings restrict it from consuming the resources the workload demands. In that example, a known CPU-bound workload was configured with too few virtual CPUs. This under-provisioning caused the benchmark to execute 40% longer than an efficiently provisioned guest machine executing the same workload. If that guest machine were constrained even further – say, it was configured to access only virtual CPU – the performance penalty would have been even more severe.

The fact that an efficiently provisioned Hyper-V guest machine can reach performance levels that are very similar to guest machines is encouraging, as is the evidence for the ability of virtualization technology to support applications that have a need to scale up and out by running multiple machine images in parallel. These are important capabilities, helping in situations, for instance, where the resource demand is very elastic. One important caveat that emerges is that, in practice, efficiently provisioned guest machines are difficult to distinguish from over-provisioned. This was possible in the benchmark runs because I configure and control the workloads themselves. The difficulty in identifying guest machines that are over-provisioned, of course, presents a serious capacity planning challenge.

The last column in Table 3 shows the CPU utilization directly attributed to the Hyper-V hypervisor, which ranged from 4 to 8%. The amount of hypervisor overhead is a function of the guest machine activity that generates interrupts, intercepts and Hypercalls. Notice that the scenario with the least amount of hypervisor activity is the one with the guest machine that was under-provisioned with only two logical processors defined. Not all the overhead associated with Hyper-V virtualization is captured by this performance counter, however, since there are also Hyper-V components that execute in the Root partition and in the child partitions. Hyper-V does provide a set of performance counters under the Hyper-V Logical Processor object that help you to assess how much virtualization overhead is involved. Figure 33 is an example of these measurements that break down the rate of interrupt processing by the hypervisor. Among the four categories of hypervisor interrupts, inter-processor interrupts predominate in this workload, which was running four guest machines concurrently. A smaller number of hypervisor Scheduler, Timer and hardware interrupts were also handled.

overcommitted scenario hypervisor processor interrupts

Figure 33. Hypervisor interrupt processing, broken down by the type of interrupt. Among the four categories of hypervisor interrupts that are counted, inter-processor signaling interrupts predominate in this workload, which was running four guest machines concurrently.

The next post in this series looks at Hyper-V’s guest machine virtual processor priority scheduling options to determine how effective they are in insulating a preferred guest machine from the performance impact of running on an over-committed virtualization Host.