Hyper-V performance: CPU Priority scheduling options

Finally, let’s look at how effective the Hyper-V processor scheduling priority settings are at insulating preferred guest machines from the performance impact of an under-provisioned (or over-committed) Hyper-V Host machine. As discussed earlier, Hyper-V virtual processor scheduling options include allowing you to prioritize the workloads from guest machines that are resident on the same Hyper-V Host. To test the effectiveness of these priority scheduling options, I re-ran the under-provisioned 4 X 2-way guest machine scenario with two of the guest machines set to run at a higher priority, while the other two guests were set to run at a low priority. I ran separate tests to evaluate the virtual processor Reservation settings in one scenario and the use of relative weights in another scenario.

Configuration #

guest

machines

CPUs

per machine

Best case elapsed time stretch factor
Native machine 4 90
4 Guest machines (no priority) 4 2 370 4.08
4 Guest machines with Relative Weights 4 2 230 2.56
4 Guest machines with Reservations 4 2 270 3.00

CPU Scheduling with Reservations.

For the Reservation scenario, the two high priority guest machines reserved 50% of the virtual processor capacity they were configured with. The two low priority guest machines reserved 0% of their virtual processor capacity. Figure 34 shows the Hyper-V Manager’s view of the situation – the higher priority machines 1 & 2 clearly have favored access to the Hyper-V logical processors. The two higher priority guests are responsible for 64% of the CPU usage, while the two low priority machines are consuming just 30% of the processor resources. The guest machines configured with high priority settings executed to completion in about 270 minutes (or 4 ½ hours). This was about 27% faster than the equally weighted guest machines in the baseline scenario where four guest machines executed the benchmark program without any priority settings in force.

HyperV logical processor reservation scenario screenshot

Figure 34. The Hyper-V Manager’s view of overall CPU Usage during the Reservation scenario. Together, the higher priority machines 1 & 2 are responsible for 64% of the CPU usage, while the two low priority machines are consuming just 30% of the CPU capacity.

Figure 35 reports on the distribution of the Virtual Processor utilization for the four guest machines executing in this Reservation scenario during a one-hour period. Guest machines 1 & 2 are running with the 50% Reservation setting, while machines 3 & 4 are running with the 0% Reservation setting. Instead of the view in Figure 32 where each guest machine has equal access to virtual processors, the high priority guest machines clearly have favored access to virtual processors. Together, the 4 higher priority virtual processors consumed about 250% out of a total of 400% virtual processor capacity, almost twice the amount of residual processor capacity available to the lower priority guest machines.

reservation scenario virtual processor utilization

Figure 35. Virtual Processor utilization for the four guest machines executing in the Reservation scenario.

Hours later when the two high priority guest machines finished executing the benchmark workload, those guest machines went idle and the low priority guests were able to consume more virtual processor capacity. Figure 36 shows these higher priority guest machines executing the benchmark workload until about 10:50 pm, at which point the Test 1 & 2 machines go idle and machines 3 & 4 quickly expand their processor usage.

reservation scenario virtual processor utilization2

Figure 36. The higher priority the Test 1 & 2 machines go idle about 10:50 pm, at which point machines 3 & 4 quickly expand their processor usage.

As Figure 36 indicates, even though the high priority Test machines 1 & 2 are idle, there virtual processors still get scheduled to execute on the Hyper-V physical CPUs. When guest machines do not consume all of the virtual processor capacity that is requested in a Reservation setting, that excess capacity is available for lower priority guest machines to use.

Figures 37 and 38 show the view of processor utilization available from inside one of the high priority guest machines. Figure 37 shows the view of the virtual hardware that the Windows CPU accounting function provides, plus it shows the instantaneous Processor Ready Queue measurements. These internal measurements indicate that the virtual processors are utilized near 100% and there is a significant backlog of Ready worker threads from the benchmark workload queued for the two virtual CPUs.

reservation scenario favored guest machine processor queuing

Figure 37. Internal Windows performance counters indicate that the virtual processors are utilized near 100%, with a significant backlog of Ready worker threads from the benchmark workload queued for the two virtual CPUs.

Figure 37 shows the % Processor Time counter from the guest machine Processor object, while Figure 38 shows processor utilization for the top 5 most active processes, with the ThreadContentionGenerator.exe – the benchmark program – predominant.

reservation scenario favored guest machine processor utilization per process

Figure 38. The benchmark program ThreadContentionGenerator.exe consumes all the processor cycles available to the guest machine.

 

CPU Scheduling with Relative Weights.

A second test scenario used Relative Weights to prioritize the guest machines involved in the test, leading to results very similar to the Reservation scenario. Two guest machine were given high priority scheduling weights of 200, while the other two guest machine were given low priority scheduling weights of 50. This is the identical weighting scheme described in the earlier CPU weight example. Mathematically, the proportion of each virtual processor allocated for the higher priority guest machines was 80%, with 20% of the processor capacity allocated to the lower priority guests. In actuality, Figure 39 reports each high priority virtual processor consuming about 75% of a physical CPU, while the four lower priority virtual processors consumed slightly more than 20% of a physical CPU.

Since the higher priority guest machines were able to consume more processor time than in the Reservation scenario, the higher priority machines were able to complete the benchmark task in 230 minutes, faster than the best case in the Reservation scenario and about 38% faster than the baseline scenario where all four guests ran at the same Hyper-V scheduling priority.

weighted Scheduler scenario virtual processor utilization

Figure 39. In the Relative Weights scenario, each high priority virtual processor consumed about 75% of a physical CPU, while the four lower priority virtual processors consumed slightly more than 20% of a physical CPU.

As in the Reservation scenario, once the high priority guest machines completed their tasks and went idle, the lower priority guest machines gained greater access to the physical CPUs on the Hyper-V Host machine. This shift is highlighted in Figure 40, which shows the higher priority virtual processors for guest machines 1 & 2 tailing off at around 1:40 pm, which allows the processor usage from the lower priority virtual processors to take off at that point. The CPU usage pattern in Figure 40 showing this shift taking place during the Relative Weights scenario is very similar to the Reservation scenario shown in Figure 36.

weighted Scheduler scenario virtual processor utilization1

Figure 40. When the higher priority virtual processors for guest machines 1 & 2 finish processing 1:40 pm, the processor usage by the lower priority virtual processors accelerates.

 

Hyper-V Performance: Understanding guest machine performance, Part III

In this post, the baseline measurements discussed in the previous post are compared to results for an under-provisioned guest machine in order to characterize the performance delays guest machines encounter when workloads execute under virtualization. This post also reports benchmark results reflecting both an efficiently provisioned Hyper-V Host and an over-committed one.

I simulated an under-provisioned guest machine by executing the same benchmark on a guest Windows VM that had access to only two of the 4 available physical processors. Configured to use only two virtual processors, the benchmark program required 147 minutes to run to completion, compared to 105 minutes on the 4-way guest machine.

Obviously, in this scenario the performance of the benchmark workload being executed on the 2-way guest machine suffered because it did not have access to an adequate number of virtual processors. It is easy to see that this guest machine is under-provisioned in this example where the conditions are tightly controlled. The key is being able to recognize when guest machines that are executing an unknown workload are under-provisioned. Look for the combination of the following:

  1. Each of the Hyper-V virtual processors allotted to the child partition show % Run Time processor utilization measurements approaching 100% busy, and
  2. Internal guest machine System\Processor Queue Length measurements exceed 3X the number of virtual processors that are configured.

Together, these internal measurements are reliable indicators that the guest machine CPU workload is constrained by access to too few virtual CPUs.

Efficiently-provisioned Hyper-V Host and its Guests

When the Hyper-V Host machine is efficiently provisioned, application responsiveness is still affected, but it becomes possible to scale up and scale out an application. By running the same benchmark program simultaneously on 2 2-way guest machines, I was able to generate a simple example of this scaling out behavior. When run concurrently in separate two-processor virtual machines, each individual benchmark ran to completion in about 178 minutes, an execution time stretch factor of almost 2, compared to the native execution baseline. But, interestingly, the overall throughput of the guest machines doubled, since two full complements of tasks ran to completion during that time period.

Over-committed Hyper-V Host

Having established that the benchmark workload will absorb all the CPU capacity that is available on the Hyper-V Host, it is easy to move from efficiently provisioned to an under-provisioned Host machine. This was accomplished by doubling the number of guest machines that are executing concurrently, compared to the previous benchmarking configuration. With four 2-way guest machines executing concurrently, the Hyper-V Host is thoroughly out of CPU capacity. Yet, Hyper-V still continues to execute the guest machine workloads efficiently. The execution time of a single benchmark job increases to 370 minutes, a stretch factor of almost 4.1 times slower than the native machine baseline. Throughput also increases proportionately – four times as many tasks were completed during that longer period.

The symptoms that the Hyper-V Host machine is out of CPU capacity are easy to spot. Figure 31 reports that each of the four guest machines consumes close to 100% of one of the available physical CPUs. Hyper-V utilization continues to hold steady at approximately 6% busy. There is no excess processor capacity.

overcommitted scenario Host machine processor utilization

Figure 31. Guest machines consume all available processor cycles when four 2-way guest machines were configured to run concurrently. Hypervisor CPU utilization continued to hold steady at around 6%.

If the physical CPUs are overloaded, you can then drill into the CPU usage by each of the virtual machines. Figure 32 shows the processor utilization distributed evenly across all the child partition virtual processors, which are weighted evenly in this example.

 

overcommitted scenario guest machine virtual processor utilization

Figure 32. Guest machine CPU usage is tracked by virtual processor. Here virtual processor usage is distributed evenly across all the child partition, which are weighted evenly in this example.

The results from timing the six benchmark runs that were discussed in this post and the previous post are summarized in Table 3, which also shows the virtualization “stretch factor” calculated from the ratio of the elapsed execution time of the guest machine configuration compared to native Windows performance.

Configuration
# of
guest 
machines
CPUs per guest machine

elapsed time (minutes)

stretch factor Thruput Hyper-V

% Run Time

Native machine

1

4

90

1

Root Partition

1

4

100

1.11

1

6%

Guest machine

1

4

105

1.17

1

8%

Under-provisioned Guest machine

1

2

147

1.63

1

4%

2 Guest machines

2

2

178

1.98

2

6%

4 Guest machines

4

2

370

4.08

4

6%

TABLE 3.

Benchmarking the performance of Guest machines in various configurations compared to running the same benchmark application standalone on native hardware.

Discussion.

Summarizing this set of benchmark results, we can see that it is reasonable to expect any timing test to execute about 15% longer when it is running on an adequately provisioned virtual machine, compared to running on native hardware. Meanwhile, while I only provided a single, simple example, it is readily apparent that an under-provisioned guest machine pays a substantial performance penalty when its configuration settings restrict it from consuming the resources the workload demands. In that example, a known CPU-bound workload was configured with too few virtual CPUs. This under-provisioning caused the benchmark to execute 40% longer than an efficiently provisioned guest machine executing the same workload. If that guest machine were constrained even further – say, it was configured to access only virtual CPU – the performance penalty would have been even more severe.

The fact that an efficiently provisioned Hyper-V guest machine can reach performance levels that are very similar to guest machines is encouraging, as is the evidence for the ability of virtualization technology to support applications that have a need to scale up and out by running multiple machine images in parallel. These are important capabilities, helping in situations, for instance, where the resource demand is very elastic. One important caveat that emerges is that, in practice, efficiently provisioned guest machines are difficult to distinguish from over-provisioned. This was possible in the benchmark runs because I configure and control the workloads themselves. The difficulty in identifying guest machines that are over-provisioned, of course, presents a serious capacity planning challenge.

The last column in Table 3 shows the CPU utilization directly attributed to the Hyper-V hypervisor, which ranged from 4 to 8%. The amount of hypervisor overhead is a function of the guest machine activity that generates interrupts, intercepts and Hypercalls. Notice that the scenario with the least amount of hypervisor activity is the one with the guest machine that was under-provisioned with only two logical processors defined. Not all the overhead associated with Hyper-V virtualization is captured by this performance counter, however, since there are also Hyper-V components that execute in the Root partition and in the child partitions. Hyper-V does provide a set of performance counters under the Hyper-V Logical Processor object that help you to assess how much virtualization overhead is involved. Figure 33 is an example of these measurements that break down the rate of interrupt processing by the hypervisor. Among the four categories of hypervisor interrupts, inter-processor interrupts predominate in this workload, which was running four guest machines concurrently. A smaller number of hypervisor Scheduler, Timer and hardware interrupts were also handled.

overcommitted scenario hypervisor processor interrupts

Figure 33. Hypervisor interrupt processing, broken down by the type of interrupt. Among the four categories of hypervisor interrupts that are counted, inter-processor signaling interrupts predominate in this workload, which was running four guest machines concurrently.

The next post in this series looks at Hyper-V’s guest machine virtual processor priority scheduling options to determine how effective they are in insulating a preferred guest machine from the performance impact of running on an over-committed virtualization Host.

Over-provisioned Hyper-V Hosts: Understanding Guest Machine Performance, Part II

This long post discusses guest machine performance under Hyper-V looks when there is an over-provisioned Hyper-V Host. When there is an over-provisioned Hyper-V Host, guest machine workloads are subject to a minimal performance penalty, which I will attempt to quantify. This is the eighth post in a series on Hyper-V performance. The series began here.

It is easy to recognize a generously over-provisioned Hyper-V Host machine – its processors are underutilized and machine memory is not fully allocated. When the machine’s logical CPUs are seldom observed running in excess of 25-40% busy, there is ample CPU capacity for all its resident guest machines, especially considering that most Hyper-V Host machines are multiprocessors. Memory can safely be regarded as underutilized when more than 40% of it is Available for allocation by the hypervisor and no guest machine is running at its maximum Dynamic Memory setting.

Note: The dispatching of a guest machine virtual processor is delayed when all CPUs are busy, so it is forced to wait. In a symmetric multiprocessor, the probability that all CPUs are busy simultaneously is the joint probability that all the processors are busy. For example, if there are four CPUs and each CPU is busy 25% of the time, the joint probability of all the CPUs being busy simultaneously is 0.25 * 0.25 * 0.25 * 0.25, or 0.004. The probability that all CPUs are simultaneously busy is only ~0.4%.

When Hyper-V Host machines are over-provisioned, the performance of guest machine applications approaches the level of native hardware. The problem with over-provisioned VM Host machines is that they are not economical. Over-provisioning on a wide enough scale often leads an initiative to increase the degree of sever consolidation by trying to pack more guest machines into the existing virtualization infrastructure.

Discussing over-committed Hyper-V Host machines and under-provisioned guest machines is much more interesting because that is when serious performance problems can occur. The first set of benchmarking runs reported below was directed at showing how these two conditions can be characterized based on various Hyper-V and Windows OS performance measurements.

Several of the benchmarking runs discussed in this section deliberately overload the Hyper-V Host processors or the Host machine’s memory footprint and then look at the Hyper-V and guest machine performance counters that characterize those overloaded conditions. As we have seen, you can implement virtual processor and dynamic memory priority settings to try and protect higher priority workloads from this degradation. Additional benchmarking runs involving physical resources being over-committed reveal how effective these Hyper-V virtual processor and machine memory priority settings are in shielding higher priority workloads from the performance impact when the Hyper-V Host is overloaded.

Scaling up and scaling out

While improving application performance is not among them, virtualization hardware and software technology has evolved to the point where it currently provides compelling benefits in modern data center operations. For example, one of the clear trends in hardware manufacturing that favors virtualization is building more powerful multiprocessor cores, not faster individual processors. CPU manufacturers have resorted to packaging more processors on a chip rather than making processors faster because increases in clock speeds lead to disproportionate increases in power consumption, which also ramps up the amount of heat that has to be dissipated. In semiconductor fabrication, the manufacturers have encountered a “power wall” that resists other engineering solutions.

A second factor that promotes virtualization is industry Best Practices that lead to building and deploying Windows machines that are dedicated to performing a single role, whether they are explicit Server roles or more general purpose desktop and portable workstations handling diverse personal computing tasks. A related practice is the Technical Support group within the IT organization building and then certifying for distribution one or more stable images of the operating system and the application software installed on top of it after a lengthy period of comprehensive Acceptance Testing. This stable image is then cloned each time there is an organizational need to support another copy of this application. Virtualization software that can deploy new copies of these system images through rapid cloning of virtual machines – a process that can also be automated – adds valuable flexibility to data center operations.

Most of the virtual machines configured to handle a single server role are clearly not well matched against the powerful capabilities of the data center machines they would be deployed to. Without virtualization, these data center machines would often be massively over-provisioned if they were only capable of running an individual Windows Server workload. Virtualization technology offers relief from this conundrum, a convenient way to consolidate many of these individual workloads on a single piece of equipment. Essentially, virtualization technology provides a flexible, software-based mechanism that allows system administrators to utilize current hardware more effectively while retaining all the administrative advantages of isolating workloads on dedicated servers.

Still, spinning up a new guest machine from the standard server or workstation Build is not the only possible response to each new request for IT services. There are viable alternatives, including allowing a single instance of IIS, for example, to host multiple application web sites or installing multiple instances of SQL Server on a production or test machine. IT professionals are sometimes reluctant to choose these configuration alternatives because they are concerned about the performance risks associated with multiple web servers sharing a single machine image, for example. Of course, these performance risks do not magically disappear when multiple guest machines are provisioned instead. The problem of over-committing shared computer resources is merely elevated to the level associated with Hyper-V administration.

Finally, having firmly established itself as an integral part of large scale data center operations, virtualization technology continues to evolve other virtual machine management capabilities, including replication, live migration, dynamic load balancing, automatic failover and recovery. The flexibility that virtualization solutions also provide in being able to provision a new machine quickly can benefit the performance of workloads that are running up against capacity limits in their current configuration and need to scale out across multiple machines in an application cluster to achieve higher levels of throughput.

Benchmark results

To gain some additional perspective on the performance impact of virtualization, we will look first at some benchmarking results showing the performance of virtual machines in various simple configurations, which we will also compare to native performance where Windows is installed directly on top of the hardware. For these performance tests, I used a benchmarking program that simulates the multi-threaded CPU and memory load of an active ASP.NET web application, but without issuing disk or network requests so that those limited resources on the target machine are not overwhelmed in the course of executing the benchmark program.

The benchmark program I used for stress testing Hyper-V guest machines is a Load Generator application I wrote that is parameter-driven to generate a wide variety of “challenging” workloads. The current version is a 64-bit .NET program written in C# called the ThreadContentionGenerator. It has a main dispatcher thread and a variable number of worker threads, similar to ASP.NET. You set it to execute a fixed number of concurrent tasks, and perform a specific number of iterations of each task. Each task allocates a large .NET collection object that it then fills with random data. It then searches the collection repeatedly, and finally deletes all the data. In this fashion, the program stresses both the processor and virtual memory. Periodically, each active thread simulates an IO wait by sleeping, where the simulated IO rate and the IO duration is also subject to some degree of realistic variation.

The benchmark program is a very flexible beast that can be adjusted to stress the machine’s CPUs, memory or both. You can execute it in a shared nothing environment where the threads execute independent of each other. Alternatively, you can set a parameter that adds an element of resource sharing to the running process so that the threads face lock contention. In contention mode, the main thread sets up some shared data structures that the worker threads access serially to generate a degree of realistic lock contention that can be dialed either up or down by increasing or decreasing the amount of processing spent in the critical section.

For this first set of Hyper-V guest machine performance experiments, I set the number of concurrent worker tasks to 32 and the number iterations to 90:

ThreadContentionGenerator.exe –tasks 32 –iterations 90

 

There are additional parameters to vary the virtual memory footprint of the program, the duration of IO waits and the rate of lock contention, but for this set of tests I let the program run with default values for those three parameters. With these settings, the program generates a load that is similar in many respects to a busy ASP.NET web application, one that is compute-bound, with requests that can be processed largely independent of each other. Note that the intent was to stress the Hyper-V environment, beginning by stressing the machine’s CPU capacity, without attempting a realistic simulation of a representative or a particular ASP.NET workload.

The hardware was an Intel i7 single socket machine with four physical CPUs (and Intel Hyper-Threading disabled) and 12 GB of RAM. The OS was Windows Server 2012 R2.

  • Native performance baseline

Running first on the native machine – after re-booting with Hyper-V disabled – the benchmark program ran to completion in about 90 minutes, the baseline execution time we will use to compare the various virtualization configurations that were tested. The only other active process running on the native Windows machine was Demand Technology’s Performance Sentry performance monitor, DmPerfss.exe, gathering performance counters once per minute.

At this stage, the only aspect of the benchmark program’s resource usage profile that is relevant is its CPU utilization. Because each task being processed goes to sleep periodically to simulate I/O, individual worker threads are not CPU-bound. However, since there are 32 worker threads executing concurrently and only four physical CPUs available, the overall workload is CPU-bound, as evidenced in Figure 25, which reports processor utilization by the top 5 consumers of CPU time during a one hour slice when the ThreadContentionGenerator program was active on the native machine.

Figure 25. Native execution of the benchmark program shows CPU utilization near 400% on a single socket machine with 4 physical CPUs. Instantaneous measurements of the System/Processor Queue Length counter, represented by a dotted line chart plotted against the right-hand y-axis, indicate a significant amount of processor queuing.

You can see in Figure 25 that overall processor utilization approaches the capacity of the machine at close to 400% utilization. The dotted line graph in Figure 25 also shows the instantaneous values obtained from the Processor Queue Length counter. The number of threads waiting in the Windows Scheduler Ready Queue exceeds fifteen for some of the observations. We can readily see that not only are the four physical CPUs on the machine quite busy, at many intervals there are a large number of ready threads waiting for service. Figure 26 confirms that the threads waiting in the Ready Queue are predominately from the ThreadContentionGenerator process (shown in blue), which is the behavior I expected, by the way.

Figure 26. This chart charts threads with a Wait State Reason indicating they are waiting in the OS Scheduler Ready Queue. As expected, most of the ready threads in the Ready Queue are from the benchmark program, the ThreadContentionGenerator process.

  • Standalone in the Root partition

In the next scenario, running standalone on the Root partition under Hyper-V with no child partitions active, the same benchmark executed for approximately 100 minutes, about 11% longer than the native execution baseline. In many scenarios a 10% performance penalty is a small price to pay for the other operational benefits virtualization provides, but it is important to keep in mind that there is always some performance penalty that is due whenever you are running an application in a virtualized environment.

Applications take longer to run inside a virtual machine compared to running native because of a variety of virtualization costs that are not encountered on a native machine. These include performance costs associated with Hyper-V intercepts and Hypercalls, plus the additional path length associated with synthetic interrupt processing. As mentioned above, the benchmark program simulates IO by issuing Timer Waits. These require the timer services of the hypervisor, which are less costly that the synthetic interrupt processing associated with disk and network IO. So, the 10% increase in execution time is very likely a best case of the performance degradation to expect.

Those costs of virtualization are minor irritants so long as the Hyper-V Host machine can supply ample resources to the guest machine. The performance costs of virtualization do increase substantially, however, when guest machines start to contend for shared resources on the Host machine.

Since processor scheduling is under the control of the hypervisor in the second benchmark run, for reliable processor measurements, it is necessary to turn to the Hyper-V Logical Processor counters, as shown in Figure 27. For a one-hour period while the benchmark program was active, overall processor utilization is reported approaching 400%, but you will notice it is slightly lower than the levels reported for the native machine in Figure 25. Figure 27 also shows an overlay line graphing hypervisor processor utilization against the right-hand y-axis, which accounts for some of the difference. The hypervisor consumes about 6% of one processor over the same measurement interval. The amount of CPU time consumed directly by the Hyper-V hypervisor is one readily quantifiable source of virtualization overhead that causes performance of the benchmark application to degrade by 10% or so.

Standalone guest virtual processor utilization

Figure 27. Running the benchmark workload standalone on the Root partition, the hypervisor consumes about 6% of one processor. Overall CPU utilization approaches 400% busy, slightly less busy than the configuration shown in Figure 23.

Reviewing the Hyper-V counter measurement data, we can see that thread execution inside the Root Partition executes on a virtual processor, subject to the hypervisor Scheduler, the same as the virtual processor scheduling performed for any guest machine child partition. When the Windows OS inside the Root Partition executes a thread context switch, the Hyper-V performance counters graphed in Figure 28 show that there is a corresponding hypervisor context switch. For child partitions, there is an additional Hyper-V Scheduler interrupt that requires processing on a context switch, so there is slightly more virtualization overhead whenever child partitions are involved.

Standalone guest logical processor context switches

Figure 28. Each time the Windows OS inside the Root Partition executes a thread context switch, there is a corresponding hypervisor context switch.

The Hyper-V Logical Processor utilization measurements do include a metric that should be directly comparable to the System\Processor Queue Length measurement that was shown in Figure 25 called CPU Wait Time per Dispatch, which is available at the virtual processor level. Unfortunately, this performance counter is not helpful, however. It is not clear what the units of Wait Time that are reported, although an educated guess is standard Windows 100-nanosecond timer units seems likely. It also reports Wait Time in very discrete, discontinuous measurements, which is strange. Together, these two issue make for problems of interpretation. Fortunately, the System\Processor Queue Length is an instantaneous measurement that remains serviceable under Hyper-V. Figure 29 shows the same set of Process(*)\% Processor Time counters and a Processor Queue Length overlay line as Figure 25. The length of the processor Ready Queue for the Root partition is comparable to the native benchmark run, with even some evidence that the Ready Queue delays are slightly longer in the configuration where virtualization was enabled.

Standalone guest machine benchmark process utilization

Microsoft strongly suggests that you do not use the Root partition to execute any work other than what is necessary to administer the VM Host machine. There is no technical obstacle that prevents you from executing application programs on the Root partition like I did with the benchmark program. But it is not a practice that is recommended. The Root partition provides a number of high priority virtualization services, like the handling of synthetic disk and network IO requests, which you want to take pains to try not to impact by running any other applications in the Root.

  • Standalone in a single child partition

Given the prohibition against running applications in the Root, the more useful comparison quantifying the minimum overhead of virtualization would be to compare performance of a guest machine in a child partition with performance on native hardware. So, on the same physical machine, I then created a Windows 8.1 virtual machine and configured it to run with 4 virtual processors. Making sure that nothing else was running on the Hyper-V server, I then ran the same benchmark on the 4-way guest machine. This time the benchmark ran to completion in 105 minutes.
Notice that on the child partition the benchmark run took about 5% longer when a single 4X Guest machine was configured. This virtual machine had access to all the physical CPUs that were available on the physical machine and executed in a standalone environment where it did not have to contend with any other guest VMs for processor resources. 105 minutes in execution time is about 17% longer than it took the same benchmark program to execute in native mode. Figure 30, which shows the rate that the Hyper-V hypervisor processed several types of virtualization-related interrupts, provides some insight into why execution time elongates under virtualization. Notice that hypervisor Scheduler interrupts occur when child partitions are executing – these Scheduler interrupts do not occur when threads are executing inside the Root partition, as illustrated back in Figure 28.

Logical processor interrupts for standalone child partition

Figure 30. Interrupt processing rates reported for the hypervisor when a child a partition is active.

This configuration was also noteworthy because the hypervisor CPU consumption was reported as about 8%, a slightly higher utilization level (+25%) than any of the other configurations evaluated.

Today, performance testing is often performed on virtual machines due to the fact that they are only intermittently active, plus the ease with which you can spin them up and tear them down again. In my experience it is reasonable to expect the same workload to take about 10% longer to execute if you run inside a VM under ideal circumstances, which implies the VM has access to all the resources it needs on the machine, and there is no or minimal contention for those resources from other resident guest machines. This first set of benchmark tests show that the performance degradation to expect when a guest machine executes on an efficiently-provisioned VM Host is for tasks to run approximately 10% slower. Consider this a minimum stretch factor that elongates execution time due to various virtualization overheads. Furthermore, it is reasonable to expect this stretch factor to increase whenever the guest machine is under-provisioned or the Hyper-V machine is over-committed.

In the next post, this baseline measurement is compared to the other possible VM configurations: an efficiently-provisioned Host machine, an over-committed VM Host machine, and, finally, an under-provisioned guest machine. In the case of an efficiently-provisioned VM Host machine, we can expect a stretch factor comparable to the minimum stretch factor reported here. However, as we will see, when the VM Host machine is significantly over-committed or the guest machine is significantly under-provisioned, quest machine workloads can experience a severe performance penalty.

 

 

 

 .