Inside the Windows 8 Runtime, Part 1

This is the next installment in a series of blog posts on the recent Windows 8 release that I began a few months back. In the last entry I expressed some reservations about the architectural decisions associated with the new Windows Runtime API layer. In this post and the ones that follow, I will provide more detail about my concerns as we look inside the new Windows Runtime layer. But, first, we will need some background on the native C language Win32 API, COM, and the Common Language Runtime (CLR) used in the .NET Framework. Collectively, these three facilities represent the run-time services available to Windows application prior to Windows 8. As I mentioned in the earlier posts, the new Windows Runtime layer in Windows 8 is a port of a subset of the existing Windows Win32 run-time to run on the ARM hardware platform.

Windows Run-time Libraries

Run-time libraries in Windows provide an API layer that applications running in User mode can call to request services from the OS. For historical reasons, this runtime layer is known as Win32, even when a Win32 service is called on a 64-bit OS. A good example of a Win32 runtime service is any operation that involves opening and accessing a file stored somewhere in the file system (or the network, or the cloud). Application programs require a variety of OS services in order to access a file, including authentication and serialization, for example.

Today, the Win32 API layer spans 100s of dlls and contains hundreds of thousands of methods that Windows applications can call. One noteworthy aspect of the Win32 runtime libraries is how long they need to persist, due to the large number of Windows applications that depend on them. Software endures. The extremely broad Win32 surface area creates a continuing obligation to support those interfaces, or else risk introducing OS upgrades that are not upwardly compatible with earlier versions of the OS.

Historically in Windows, language-specific run-time libraries were provided to allow applications to interact with the OS to perform basic functions like accessing the keyboard, the display, the mouse, and the file system. By design, the standardized “C” language was kept compact to make it more portable, but was intended to be augmented by C runtime libraries for basic functions like string-handling. To support development in C++, for example, Microsoft provided the Microsoft Foundation Class library, or MFC, which included a set of object-oriented C++ wrappers around Win32 APIs, originally developed using the C language. When you consider that Win32 APIs provide common services associated with Windows user interface GUI elements like windows, menus, buttons and controls, dialog boxes, etc., you can imagine that the scope of MFC was and is quite broad.

MFC also incorporated many classes that were not that closely associated with specific OS services, but certainly made it easier to develop applications for Windows. Good examples of these ancillary classes include MFC classes for string handling, date handling, and data structures such as Arrays and Lists. Having access to generic C++ objects that reliably handle dates and other calendar functions or implement useful data structures, such as lists of (variable length) strings, greatly simplifies application development for Windows.

COM Objects.

The set of technologies associated with the Microsoft Component Object Model (COM) brought an object-oriented way to build Windows applications. COM originally arose out of a need to support inter-process communication between Windows applications of the type where, for example, drag-and-drop is used to pull a file object from the Windows Explorer app and plop it into the active window of a desktop application. The most intriguing aspect of COM is that the programming model is designed explicitly to support late binding to objects during run time, something that is essential for a feature like drag-and-drop to work across unrelated processes. Late binding to COM objects allowed for the construction of components whose behavior was discoverable during run-time, something which was very innovative.

At the heart of COM is the IUnknown interface, which all COM classes must support. The IUnknown interface has three methods:  QueryInterface, AddRef and Release. The AddRef and Release methods are used to manage the object’s lifetime. QueryInterface is used by the calling program to discover whether the COM object it is communicating with supports a contract that the caller understands. (See, for example, this documentation for a discussion of the QueryInterface mechanism for late binding: http://msdn.microsoft.com/en-us/library/ms687230(v=vs.85).aspx.)

Because COM objects are discoverable during run-time, COM development also enabled third party developers to add component libraries of their own design. Component libraries are packaged as DLLs and installed onto the Windows machine. Originally, COM components also required an entry in the Windows registry to store a CLSID guid, a globally unique identifier that is used in calls to instantiate the COM component during runtime. If you use the Registry Editor to look for CLSIDs stored under the HKEY_CLASSES_ROOT key, you will typically see thousands of COM objects that are installed on your system, as illustrated in Figure 1. (Typically, you will also see several versions of the same – or similar – COM objects installed. This is the only good way to deal with versioning in COM.)

Note that beginning in Windows XP, the requirement for COM objects to use the registry was relaxed somewhat; the CLSID and other activation metadata can now also be stored in an XML-based assembly manifest instead, but only in certain cases.

Registry-screenshot-showing-COM-classes

Figure 1. The CLSIDs of installed COM objects that are available at runtime are registered under the HKEY_CLASSES_ROOT key in the Windows Registry.

After years of object-oriented programming (OOP) proponents advocating the use of standardized components in application development, COM technology proved beyond a doubt the efficacy of this approach. COM changed the face of software development on Windows, leading to the development of a wide range of third party component libraries that extended the MFC classes and opening up Windows software development significantly. COM components packaged as ActiveX controls could easily be added to any application development project. For example, it became commonplace for Windows developers to use third party ActiveX controls to give their application a similar look-and-feel to the latest Microsoft version of Office. In my case, instead of trying to develop a charting tool for a performance data visualization app from the ground up, I licensed a very professional Chart control from a third party component library developer and plugged that into my application.Limitations of the COM programming model.As innovative as the COM programming model was and how useful the technology proved to be in extending the Windows development platform, aspects of the late-binding approach used in COM came to be seen as having some decidedly less than desirable qualities. I will mention just three issues here

  • complex performance considerations,
  • memory leaks, and
  • the reality of dynamically linking to a complex object-oriented interface.

Developing well-behaved COM objects turns out to be quite difficult, forcing developers to deal with potentially complex performance considerations such as threading, concurrency, and serialization alternatives. COM objects can live either in-process or out-of-process in separate COM Server address spaces. (In COM+, COM objects can even be distributed across the network.) The runtime infrastructure to support late binding to all types of COM objects at runtime is quite complex, but at this point, of course, it is deeply embedded into the Windows OS.Software developers also discovered that applications built using persistent COM objects were prone to memory leaks in often devious, difficult-to-diagnose, non-obvious ways. Since COM objects could be shared across multiple processes, the COM object itself had responsibility for managing object lifetimes using reference counting – implemented using the AddRef and Release methods of the IUnknown interface. Keeping track of which ActiveX objects are active inside your program and making sure inactive ones are de-referenced in a timely fashion can be complicated enough. Nesting one ActiveX control inside another ActiveX control, for instance, can create a circular chain of reference to those objects that defeats reference counting. When that happens, the objects in the chain can never be destroyed, and they can accumulate inside the process address space until virtual memory is exhausted.

Finally, it turned out the idea that all the property settings and methods available for an object embedded in your application be discoverable during runtime, which sounded good on paper, wasn’t that useful a construct for dealing with a complex interface. As a practical matter, crafting code to deal with a complicated interface to a COM object that you could bind to dynamically creates just as many dependencies as any statically linked object. Another very pointed objection to late binding is that it can lead to a variety of logic errors that are only discoverable during runtime testing. On the other hand, many of these same type-mismatch (or invalid cast) errors could readily be detected during compile-time using static binding to strongly typed objects.

Building components in the .NET Framework

This criticism of the COM programming model was taken to heart by the architects of the .NET Framework. The Framework languages – mainly C# – was designed from the ground-up using OOP principles like inheritance and polymorphism, and adopted the best practices associated with software engineering quality initiatives like the Ada programming language. Unlike the C++ language, which was grafted on top of the C programming language, the Framework languages dispense with the use of address pointers entirely. Pointers to object instances are maintained internally, of course, but cannot be referenced directly by user code – except for the express purpose of interoperability with COM objects and Win32 API calls.Object instance reference counting is internalized as well in the Framework languages, which enables the .NET Common Language runtime (CLR) to periodically clean-up de-referenced pointers automatically using garbage collection. Automatic garbage collection is one of the key features of the .NET managed code environment. Memory leaks of the type associated with the kinds of housekeeping logic errors that tend to plague programs written utilizing COM objects are eliminated in one fell swoop. (To be sure, memory leaks of other types can still occur, and dealing with an automatic garbage collection procedure can sometimes be tricky. See, for example, this MSDN article “Investigating Memory Issues” from CLR developer Maoni Stephens, or some of the online documentation that I wrote here that discusses typical memory management problems that .NET developers can encounter with suggestions on how to deal with them effectively.)  

“Strongly-typed” objects

Furthermore, the architects of the .NET Framework adopted an approach diametrically opposed to COM for building component libraries. The .NET Framework relies on static binding to strongly typed .NET components, something which permits the compiler to detect type mismatches. “Strongly typed” in this context means that the code references an explicit class, i.e., one derived from System.Object, which permits the compiler to detect type mismatches. .NET is very strict about implicit type conversions – they are not permitted. Your code must use explicit casts or reference one of the Convert class library methods – .NET provides something to convert from any base type to any other base type.To be sure, the .NET Framework does support dynamic binding to objects during runtime under specific circumstances. These include using Reflection and the is and as keywords. It is also often necessary for .NET applications to communicate with unmanaged code, which includes programs written in languages such as Javascript that rely on dynamic binding. (The 4.0 version of the Framework added the dynamic keyword that instructs the compiler to bypass type-checking to help with Javascript interop issues.)

Meanwhile, Windows itself and most Office apps still rely heavily on COM. The primary way the .NET Framework deals with interoperability with Win32 APIs and COM objects that pass pointers around is wrappers that create .NET classes around these APIs and COM objects. For example, instead of calling the QueryPerformanceCounter Win32 API to gather high-precision clock values in Windows, C# developers instantiate a Stopwatchobject instead and call its methods. Structs are still permitted in C#, and they are unavoidable when you are dealing with interop for Win32 APIs that don’t already have wrapper classes, a .NET feature known as platform invoke, or PInvoke, for short. If you need to call a Win32 API that is adorned with address pointers, developers can often get help from the pinvoke.net wiki, but, more often than not, as in the case of the Stopwatch class, there is likely to be a .NET wrapper already available.

Memory management in a C# program that does a fair amount of interop with native code and COM objects is also complicated since the CLR garbage collector cannot automatically free up memory associated with COM objects that have been abandoned the way it can for object instances built using managedcode. .NET applications that need to interact frequently with COM objects are prone to leak memory in often subtle, non-obvious ways. For instance, the CLR garbage collector cannot reclaim an instance of .NET class that references a persistent COM object so long as the COM object itself remains referenced.In summary, the .NET Framework addresses the key limitations associated with COM development, adopting a programming model that relies on strongly-typed objects. This approach was diametrically opposed to the one promulgated in COM, one based on binding to objects dynamically during run-time. Application programs written in one of the .NET Framework languages like C# or VB.NET, of course, still need to call Win32 services that pass pointers around and use COM objects that need reference counting. .NET classes that wrap frequently accessed COM objects or Win32 methods are very effective in hiding the fact that, under the covers, the Windows run-time still relies heavily on COM..