Is there an ARM-based PC in your future?

In the previous blog post in this series on Windows 8, I explained that Windows RT is a new application run-time layer in Windows that was built when the Windows OS was ported to the ARM architecture. ARM is the dominant processor architecture used in current smartphones and tablets, including the Apple iPhone and iPad. So, the short answer to the question posed by the title is, “You already do run an ARM-based computer, and it is the smartphone in your pocket.” The problem for Microsoft is that this ARM computer is probably not running an OS based on Windows.

Microsoft’s new Surface tablet, designed to showcase the capabilities of Windows 8, uses an ARM processor. On a Surface, you can only run applications known as Windows Store apps that are specifically built to run on top of Windows RT. You can also install and run Windows 8 on any Intel-compatible 32 or 64-bit compatible processor. The Intel version of Windows 8 is called Windows 8 Pro. Windows 8 Pro includes the new Windows RT application run-time, so Windows 8 App Store apps will run on Windows 8 Pro machines. Windows 8 Pro also includes all the older parts of Windows 7, so it is also capable of running “legacy” Windows desktop applications.

If you are a software developer trying to build one of the new Windows Store apps, first you have to install the latest version of Visual Studio and then create a project that targets Windows 8 App Store apps. To maintain compatibility across hardware platforms, a Windows Store app can only access functions in the new Windows Runtime, with the exception of some essential Win32 functions that were converted to run on an ARM processor, but were not included in the new Runtime. For security reasons, Windows Store Apps are run in a silo that limits their ability to interact with the underlying operating system or access any other running process. Microsoft has a certification process that guarantees that the app conforms to these requirements before it is made available on the Windows Store web site. (Apple has a similar policy with its App store.)

The new Runtime layer is quite extensive: see http://msdn.microsoft.com/en-us/library/windows/apps/br211377to get a sense of its scope. Applications written in either C++ or Javascript can call into the Runtime directly. The .NET Framework version 4.5 contains some glue to allow Windows Store apps to be written in C#, Visual Basic.NET or any of the other .NET-compatible languages.

Among the essential Win32 functions that are available under ARM are those associated with COM, a key technology used for years and years in Windows to package code into run-time components. In Windows programming, COM interfaces are frequently used to communicate between threads and processes. Many, many Win32 functions rely on COM interfaces. Win32 functions that were migrated to the new Runtime required a wrapper to hide the COM interface from the Windows 8 App Store app, but COM is still there under the covers. The complete COM infrastructure was ported to ARM for Windows 8, but the interfaces themselves were not re-written. If you access the “Win32 and COM for Windows Store apps” Help topic for Windows 8 developers at http://msdn.microsoft.com/en-us/library/windows/apps/br205762.aspx, you can see that COM is included in the Win32 subset that was ported to ARM. Drilling a little deeper, you can see that, for example, your Windows Store app can still call CoInitializeEx() to initialize a COM component, just like in the days of old.

So, while Windows 8 apps can call directly into the full set of Win32-based COM APIs, there are some very interesting omissions in the RT API surface. Performance monitoring is one of those omissions. Because the Win32-based performance monitoring interfaces were not ported to ARM, a Windows 8 app cannot access the performance counters associated with CPU accounting, for example, and determine how much CPU time it is consuming.

(Note: there is a workaround available. Any app on RT can still make a call directly into kernel32.dll and pull CPU consumption at the process level from it. You can use this hack while you are developing the app, but you must remove that PInvoke from the finished app before you submit it for certification, according to this Q&A article posted at http://stackoverflow.com/questions/12338953/is-there-any-way-for-a-winrt-app-to-measure-its-own-cpu-usage?lq=1. Microsoft won’t permit a retail version of a Windows Store app to call into kernel32.dll directly.)

Instead of relying on performance counters, however, your Windows Store app can utilize ETW. The Win32 APIs that are used to generate ETW trace events or Listen to them from inside your app are part of the Win32 subset that Windows Store apps can call into on ARM. (See http://msdn.microsoft.com/en-us/library/windows/apps/br205755.aspx.) The fact that ETW is fully supported on ARM while performance counters are not, by the way, is, further evidence that the counter technology is on the wane and tracing is ascendant in Windows.

It is not entirely clear why performance monitoring was omitted from Windows RT. One possibility is that the application model for Windows Store apps is very different. When you run one of these Apps, it takes over the entire display. When you switch to a different app, the first app is suspended, and it is supposed to dispose of any objects it is currently holding.

Still, if you are a game designer trying to support this new class of Windows devices, leaving out the performance monitoring capabilities is worrisome. Physical memory on the Surface is limited to 2 GB, so RAM is decidedly a constraint. The Surface uses a 4-way ARM multiprocessor, running at 1.3 GHz, so RT does support multi-threading. In fact, the RT support for multi-threading is modeled on the task.await()pattern of asynchronous programming introduced recently in the .NET Framework.

To reiterate, the key deliverable in the new OS is the port to the ARM platform. I will drill into ARM and its implications for the OS in the next post in this series..

Is there an ARM server in your future?

I cannot resist adding to some of the industry buzz about the recent HP Project Moonshot announcement and what this potentially means for folks that run Windows. As part of Project Moonshot, HP is planning to release something called the Redstone Server Development Platform in 1H12, an ARM-based, massively parallel supercomputer that uses 90% less energy than comparable Intel microprocessors.

Coupled with the big.Little hardware announcement from the ARM Consortium about two weeks ago, I think this is big news that could shake up the foundations of Windows computing. ARM-based Windows Server machines could well be in all our futures.

Let’s start with ARM and the big.Little architecture announcement, which is pretty interesting in itself. Details are from this white paper. big.Little is an explicit multi-core architecture where a simple, low power version of the processor is packaged together with a significantly more powerful (~ 2x) version of the processor that also uses about 3 times the energy consumption. The Little CPU core in the architecture is “an in-order, non-symmetric dual-issue processor with a pipeline length of between 8-stages and 10-stages.” The big guy is “an out-of-order sustained triple-issue processor with a pipeline length of between 15-stages and 24-stages,” significantly more powerful and more complex.

The idea of pairing them is that a task that needs access to more CPU power could migrate from the Little guy to the big guy in about 20K cycles, about 20 m-seconds of delay. The idea is that when you want to playback something with HD graphics on your phone, the architecture can power up the big guy and migrate the CPU-bound graphics-rendering thread so fast that the user experience is like a turbocharger kicking in. Whoosh. Meanwhile, the device – a phone or a tablet, probably – can stay powered up and ready for action for an extended period of time with just the low power CPU. Which, at 1 GHz, is still a potent microprocessor.

Apple, Android, Blackberry and Windows smartphones today all run on ARM. ARM is a reduced instruction set computer (RISC) that has significantly lower energy costs, which is why it is so popular on smartphones. The Apple iPad tablet uses ARM, and so do most iPad competitors today from folks like Samsung and HTC. Some of these tablets are dual core ARM machines today. The tablet computer running an early build of Windows 8 that Microsoft gave away at the recent PDC also uses an ARM chip. When Windows 8 does ship next year sometime, we can expect to see a flurry of announcements for Windows 8 tablets based on ARM chips. Portable devices rely on ARM chips because they consume relatively little power, and power is the most significant performance constraint in portable computing of all types.

To be fair, we will probably see some Intel Atom-based tablets, too, running Win8 being released at the same time. (The Atom is a simplified, in-order processor that supports the full Intel x64 instruction set.) But I think ARM will continue to dominate the market for smartphones and tablets in the coming years. And an ARM-compatible version of Windows is quite likely to add to that momentum – most industry observers are guessing that plenty of Windows 8 tablets are going to be sold.

Another piece of the puzzle: the 64-bit ARM extensions were just formally announced this week.

So, ARM is happening in a big way, and it is time to take notice. Plus, Windows is jumping into ARM-based computing in its next release.

BTW, I am not counting Intel out. Intel has the semiconductor industry’s foremost fabrication facilities; it is still the industry leader in manufacturing. It is the low cost, high volume manufacturer. One of the keys to its manufacturing prowess has been high volume, which is necessary to recoup the enormous capital costs associated with constructing a new, leading edge fab plant.

However, the ARM architecture is the first viable, high volume challenger to Intel’s x86/x64 products to emerge in years. In the most recent quarter, about 400 million smartphones were shipped, easily 10 times the number of Intel-compatible PCs that were sold in the same time frame. That kind of high volume success breeds more success in semiconductor manufacturing because it drives down the manufacturing cost per unit, a key component of Moore’s law.

The HP Moonshot Project announcement suggests none of this is lost on the data center where energy consumption is a major cost factor. The goal is to fill that 1-x rack space with an array of ARM computers installed on a compact motherboard with just 10% of the energy consumption of the comparable Intel computing footprint. The projected cost savings over three years are something like 60%. So, at the higher end of HPC, this is starting to happen. The Windows 8 port to ARM then leaves just one small step to get Windows Server running on these same ARM-based servers.

It will be very interesting to see how this all plays out..