Using xperf to analyze CSwitch events

Continuing the discussion from the previous blog entry on event-driven approaches to measuring CPU utilization in Windows

Last time around I discussed the same CPU busy calculations that the Resource Manager in Windows makes. This same calculation can also be performed after the fact using the event data from ETW. This is the technique used in the Windows Performance Toolkit (WPT, but which is better known around Microsoft as xperf), for example, to calculate CPU usage metrics.

Once you have downloaded and installed the Windows Performance Toolkit, you can launch a basic ETW collection session using the following xperf command:

xperf -on DiagEasy

Then, after you have accumulated enough data, issue another command to stop tracing and capture the event stream to a file:

xperf -d cputrace.etl

Next, process the cputrace.etl file using the xperfview app. After the trace file is loaded, xperfview provides visualizations that are very similar to ResMon. See Figure 6 for an example.


Figure 6. CPU utilization graphs in xperfview, based on ETW context switch event data gathered with the xperf utility. Additional ISR and DPC events are used to calculate the amount of time device drivers spend processing interrupts.

With xperfview, you can also request a Summary Table which displays CPU usage by process (and thread) by right-clicking on the CPU Usage graph and accessing the pop-up context menu. An example of the CPU Scheduling Aggregate Summary Table is illustrated in Figure 6. It is similar to the one ResMon produces. The data here was gathered while running a multi-threaded CPU soaker program called ThreadContentionGenerator while ResMon was also active. You can see that the calculation in Figure 7 roughly mirrors the results shown in Figure 5 for ResMon, allowing for some variation that is to be expected since the intervals themselves are not identical. The xperf interval shown in Figure 6 is approximately 500 seconds long, while ResMon maintains a rolling window that is only 60 seconds in duration. The ResMon screen shot was taken somewhere in the middle of the longer xperf tracing session.

For some perspective on the volume of trace events that can be generated, the binary .etl trace file produced in this example was approximately 325 MB on disk for a DiagEasy trace session that ran for more than ten minutes. Running with the xperf defaults, I received a notification when the trace session closed that three ETW 64K buffers of data were dropped during the trace because xperf was unable to keep pace with processing the event stream in real-time.

The Context Switch event also provides the old thread’s Wait Reason code, which helps you to understand why the sequence of thread scheduling events occurred. For reference, a Windows context switch is defined here, while the contents of the ETW (Event Tracing for Windows) context switch event record are defined here, including a list of the current thread WaitReason codes.

Note that you can measure CPU queue time accurately from the ETW events, an important indicator of processor contention when the processor becomes saturated. As illustrated in Figure 4, the transition from the Wait state to the Ready state is marked by a ReadyThread event record. The time between the ReadyThread event and a subsequent CSwitch event marking its transition to Running is one form of CPU Queue time. A second form of CPU queue time is the time between a CSwitch(…,out,WaitReason,…) where the WaitReason is either a Preempt or time-slice quantum expiration and a subsequent re-dispatch. Both forms of CPU queue time can be measured accurately using ETW.

When precision in understanding patterns of CPU consumption is required, post-processing the ETW context switching event stream is a much better way to proceed than attempting to use the Windows % Processor Time performance counters. Measuring CPU consumption from the context switch events is considerably more precise than, for example, the Windows performance counter data available in Perfmon that report processor utilization at the system and process level based on processor state sampling. Such high precision measurements are not always required, of course, and processing the ETW context switching event stream is relatively expensive due to the extremely high volume of trace data that you must deal with.

Figure 6 illustrates several of the CPU utilization graphs that xperfview creates from the context switch event stream. To help make the graph more readable, I filtered out Idle time calculation for all but two of the logical processors on this machine. (The machine illustrated has 8 logical CPUs.) To gain insight into what high priority Interrupt Service Routines (ISRs) and DPCs are running, ISR and DPC events should also be gathered, which the DiagEasy event profile in xperf does automatically. (Windows device driver developers are very interested in them. For an example, see this blog posting discussing using the xperf ETW utility to capture CPU consumption by the TCP/IP network driver stack.)


Figure 7. The CPU Scheduling Aggregate Summary Table calculated by xperfview. The results of these calculations closely resembles the rolling one-minute calculation reported by the Resource Monitor in Figure 5. 




Tagged , . Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *