Presentations for the upcoming CMG conference are available on

I am presenting two topics at the Computer Measurement Group (CMG) annual conference this week, and I have just posted the slide decks I will be using on

The latest slide deck for Monitoring Web Application Response Times in Windows is available here.

And the HTTP/2: Recent protocol changes and their impact on web application performance one is available here.

If you are a regular reader of this blog, most of this material will look familiar. Enjoy!


Why is my web app running slowly? — Part 2

This is a continuation of a series of blog entries on this topic. The series starts here.

In this blog entry, I start to dive a little deeper into the model of web page response time that is implicit in the YSlow approach to web application performance.


Figure 3. A GET Request issued from the web browser triggers a Response message from the server, which is received by the web client where it is rendered into a web page.

The simple picture in Figure 3 (left) leaves out many of the important details of the web protocols, including the manner in which the web server that can respond to the GET Request is located using DNS, the Internet Protocol’s (IP) Domain Naming Service. But it will suffice to frame a definition of web application response time, which is measured from (1) the time the GET Request was issued by the browser, includes the time it takes to locate the web server, (2) the time it takes the web server to create the Response message in reply, the network transmission time to send these messages back and forth, and, finally,Ž (3) for the web browser to render the Response message appropriately on the display. The response time for this Request is measured from the time of the initial GET Request to the time the browser’s display of the Response is complete such that the customer can then interact with any of the controls (buttons, menus, hyperlinks, etc.) that are rendered on the page. This response time measurement is also known as Page Load time. The YSlow tool incorporates a set of rule-based calculations that are intended to assist the web developer in reducing page load time.

Elsewhere on this blog, I have expounded at some length on the limitations of the ask-the-expert, rule-based approach, but there is no doubting its appeal. It makes perfect sense that someone who encounters a performance problem and is a relative newcomer to web application development would seek advice from someone with much more experience. However, when this expertise is encapsulated in static rules, and these rules are applied mechanically, there is often too much opportunity for nuance in the expert’s application of the rule to be missed. The point I labored to make in that earlier series of blog posts was that, too often, the mechanical application of an expert-based rule does not capture some aspect of its context that is crucial to its application. This is why in the field of Artificial Intelligence the mechanical application of rules in so-called expert systems gave way to the current machine learning approach that trains the decision-making apparatus based on actual examples[1]. On the other hand, understanding how and why an expert formulated a particularly handy performance rule is often quite helpful. That is how human experts train other humans to become expert practitioners themselves.

In essence, as Figure 3 illustrates, HTTP is a means to locate and send files around the Internet. These files contain static content, structured using the HTML markup language so that they contain the instructions the web client needs to compose and render them. However, many web applications generate Response messages dynamically, which is the case with the specific charting application we are discussing here in the case study. In that example, the Response messages are generated dynamically by an ASP.NET server-side application based on the machine, date, and chart template selected, which are all passed as parameters appended to the original GET Request message sent to the web server to request a specific database query to be executed.

As depicted in Figure 4, the HTTP protocol is layered on top of TCP, which requires a session-oriented connection, but HTTP itself is a sessionless protocol. Being sessionless means that web servers process each GET Request independently, without regard to history. In practice, however, more complex web applications, especially ones given to generating dynamic HTML, are very often session-oriented, utilizing parameters appended to the GET Request message, cookies, hidden fields, and other techniques to encapsulate the session state, reflecting past interactions with a customers and to link current Requests with that customer’s history. A canonical example is session-oriented data to associate a current GET Request from a customer with that customer’s shopping cart filled with items to purchase from the current session or retained from a previous session.


Figure 4. The networking protocol stack.

Working our way down the networking stack, the TCP protocol sits atop IP and then (usually) Ethernet at the hardware-oriented, Media Access level. These lower level networking protocols transform network requests into datagrams and from there into packets to build the streams of bits that are actually transmitted using the networking hardware. For mainly historical reasons, the Ethernet protocol supports a maximum transmission unit (MTU) of approximately 1500 bytes. Note that each protocol layer in the networking stack inserts its addressing and control metadata into a succession of headers that are appended to the front of the data packet. After accounting for the protocol headers, the maximum capacity of the data payload is closer to about 1460 bytes. The Ethernet MTU requires that HTTP Request or Response messages that are larger than 1460 bytes be broken into multiple packets by the IP layer at the Sender. IP is also responsible for reassembling the packets at the Receiver. These details of the networking protocol stack are the basis in YSlow for the performance rules that analyze the size of the Request and Response messages that are used at the level of the HTTP protocol to compose the page.

A further complication is that many web pages are composed from multiple Response messages, as depicted in Figure 5. Typically, the HTML that is returned in the original Response message contains references to additional files. These can and often do include image files that the browser must display, style sheets for formatting, video and audio files that the browser may play, etc. In the web charting application I am using as an example here, the charts themselves are rendered on the server as .jpg image files. The HTML in the original Response message references these image files, which causes the browser to issue additional HTTP GET Requests to retrieve them during the process page composition. Of course, the server-side application builds a .jpg file for each of the two charts that are to be rendered when it builds the original Response message. In order to display presentation-quality charts, the jpg files that are built are rather hefty, given that they must be transferred over the network to the web client. The GET Requests to retrieve these charts fully rendered on the web server in jpg form generate a very large Response message that then requires multiple data packets to be built and transmitted.

So, composing web pages may not only require multiple GET Requests to be issued as a result of <link> tags, Response messages that are larger than the Ethernet MTU require transmission of multiple packets. The number of networking data transmission round trips, then, is a function of both the number of GET Requests and the size of the Response messages. For the sake of completeness, note that whenever the size of a GET Request exceeds the MTU, the GET Request must be broken into multiple packets, too. The most common reason that GET Requests exceed the Ethernet MTU is when large amounts of cookie data need to appended to the Request.

Both the number of files that are requested to render the page and the file size are factors in the YSlow performance rules. For example, the HTML returned in the original Response message generated by the ASP.NET application may reference external style sheets, which are files that contain layout and formatting instructions for the browser to use. Formatting instructions in style sheets can include what borders and margins to wrap around display elements, the size and shape of what fonts to use, what colors to display, etc. The example app does rely on several style sheets, but none of them are very extensive or very large. Still, each separate style sheet file requires a separate GET Request and Response message, and some of the style sheets embedded in the document are large enough to require multiple packets to be transmitted.

Finally, the HTML can reference scripts that need to be loaded and executed when the document is being initially displayed. Scripts that modify the DOM by adding new elements to the page or changing the format of existing elements dynamically are quite common in web applications. Usually written using JavaScript, script code can be embedded within the original HTML, bracketed by <script></script> tags. However, to facilitate sharing common scripts across multiple pages of a web application, JavaScript code to manipulate the DOM can reside in external files, too, that then must be requested and loaded separately.

That is enough about JavaScript for now, but we will soon see that dynamic manipulation of the DOM via the execution of script code running inside the web client, usually in response to user interaction, has the potential to complicate the performance analysis of a web application considerably. It is worth noting, however, that YSlow does not attempt to execute any of the scripts that make up the page. Like the other HTTP objects that are requested, YSlow only catalogs the number of JavaScript files that are requested and their size. It does not even begin to attempt to understand how long any of these scripts might take to execute.

The central point is that page composition and rendering inside the browser frequently requires a series of GET Request and Response messages. The original Response message to a GET Request contains links to additional data files – static images, style sheets, JavaScript files, etc. – that are embedded in the HTML document that was requested. Each additional resource file that is needed to compose the page requires an additional GET Request and a Response Message from the web server. Consequently, the simple depiction of Page Load Time shown in Figure 3 gives way to the more complicated process depicted in Figure 5. When all the resources identified by the web client to compose the page are finally available and processed, the document composition process is complete. At that point, the page load state of the page is finalized, which means it is available to the end user to interact with.

Take a minute to consider all those times that you encounter a web page where the page is partially rendered, but is incomplete and the UI is blocked. The web browser has its instructions to gather all the HTTP objects that the web page references, and you are not able to interact with the page until all the objects referenced in the DOM are resolved. But if the Response message for any one of those objects is delayed, the page is not ready for interaction. That is what Page Load time measures.


Figure 5. Composing web pages usually requires multiple GET Request:Response message sequences due to links to external files embedded in either the original Response message or in subsequent Response messages. It is not until all external references are resolved that the Page can reach its completed or “loaded” state where all the controls available on the page are available to use.


NEXT: Exploring the YSlow scalability model.

[1] In contrast to the expert systems approach to AI, which used explicit rules to mimic the decision-making behavior of experts, machine learning algorithms do not always need to formulate explicit decision-making rules. The program “learns” through experience, making adjustments to the decision-making mechanism based on the success or failure of various trials. The decision-making mechanisms used in machine learning algorithms vary from Bayesian networks (where explicit classification rules are encoded) to neural networks to genetic algorithms. The element that is common in the machine learning approach is a feedback mechanism from the learning trials. See, for example, Peter Flach, Machine Learning for more details on the approach.


Why is my web app running slowly? — Part 1.

This series of blog posts picks up on a topic I made mention of earlier, namely scalability models, where I wrote about how implicit models of application scalability often impact the kinds of performance tests that are devised to evaluate the performance of an application. As discussed in that earlier blog post, sometimes the influence of the underlying scalability model is subtle, often because the scalability model itself is implicit. In the context of performance testing, my experience is that it can be very useful to render the application’s performance and scalability model explicitly. At the very least, making your assumptions explicit opens them to scrutiny, allowing questions to be asked about their validity, for example.

The example I used in that earlier discussion was the scalability model implicit when employing stress test tools like HP LoadRunner and Soasta CloudTest against a web-based application. Load testing by successively increasing the arrival rate of customer requests assumes there is a relationship between the response time (RT) for web requests and the number of concurrent requests, namely

RT = f(λ)

where λ represents the arrival rate of requests (the standard notation for the term in queuing theory). In the context of a stress test this implicit scalability model is often correct – drive the request rate high enough and you are apt to drive some hardware or software component to saturation, at which point queuing delays will start to have an impact on web response time.

Of course, the rate customer requests arrive to be processed represents only one dimension of a web application’s scalability profile. Experienced performance testers understand that there are other factors that influence performance. For instance, in the crucial order processing portion of the application, the size of the order, the number of items in the customer’s shopping cart that need to be processed, etc., can also be very important scalability factors. The point I tried to make in the earlier blog entry was that all the relevant dimensions that impact the application’s scalability need to be addressed in performance testing to assess the quality of the release effectively. Another way to think about this is that the application scalability model you formulate is a hypothesis that the performance and acceptance testing process is designed to test. All of which reinforces the notion that significant benefits can be derived from making the scalability assumptions that are implicit in performance testing explicit.

In another variation on this theme, I will focus in this series of blog posts on a particular model of web application performance that has proved extremely influential. This is something I call the YSlow model of web application performance, named after the YSlow performance tool, originally developed at Yahoo, associated with the work of Steve Souders. To begin, I will strive to make the scalability model implicit in web performance tools like YSlow explicit.

I will also discuss how the YSlow scalability model influenced the development of other web application performance tooling, culminating in the W3C specification of a navigation and timing API that provides access from JavaScript to web application performance measurements. The W3C spec for the web client navigation and timing API is currently imbedded in all the major web clients, including Chrome, Mozilla, Foxfire and Internet Explorer. I will drill into the W3C navigation and timing APIs to demonstrate how to gather and utilize these performance measurements, or Real User Measurements (RUM), as they have become known. The navigation and timing API is a great help to anyone with a need to understand the end-to-end web application response time experience of actual, real-life web site customers. I expect the navigation and timing API to spawn a whole new generation of web application performance tools that will exploit this valuable measurement data.

In addition, I want to cast a critical eye on the YSlow model of web application performance and highlight some areas where the reality of web application performance can depart from expectations raised by the model. There are some areas where the YSlow model is just a little too simple for the burgeoning complexity of networked-enabled applications developed for the web, the cloud, or both. Using an example of a data-rich ASP.NET application that requires extensive processing at the web server and the back-end database to generate Response messages, I will try to show what additional measurements may be required to solve performance and scalability issues that transcend the diagnostic capabilities of YSlow and similar tools.

Why is this web app running slowly?

To understand what a web application performance tool like YSlow does, it will help to be able to refer to a concrete example. Accordingly, I will discuss running YSlow against a web application that was perceived as running slowly. The application in question is also one that I care about. Figure 1 shows a screen shot of that application in its “before” stage when it exhibited serious performance and scalability problems.


Figure 1. A screen shot of an ASP.NET web application that runs slowly.

The app is a graphical reporting application devoted to visualizing performance measurements which exist in the form of time series data. It is a web front end to a back-end SQL Server-based repository of performance data that my software company provides to its customers. For purposes of this discussion, its most salient characteristic is it is a data-rich query application, which then renders the results in high resolution charts using Microsoft’s ASP.NET server-side technology. It relies specifically on the web forms Chart facility in the .NET Framework to generate presentation-quality charts and graphs, creating two such graphic images per query, as illustrated. Relatively large jpeg images of charts are generated on the server based on the result set of the designated query. These jpeg files are then transmitted to the web form over the network. In an application development environment where I was seeking to understand why it was running so slowly at times, the performance issues that were evident were quite convincingly not due to networking performance, since the web client, IIS web server, and back-end database all resided on the same (physical) Windows machine.

Other relevant features of the example application that was the subject of a performance investigation proved to include the following:

  • The queries to generate the charts are defined using a set of additional web forms to create a re-usable template for the report. These chart definition templates are also stored in the same SQL Server database where the performance data lives, allowing the queries be re-executed in subsequent interactions and sessions.
  • Dynamic elements of the database queries that are resolved at run-time include a calendar control for date selection and menus for selecting the machine or machines of interest.

Using YSlow.

Whenever you have reports of a web application that is running slowing, looking for answers from YSlow or similar performance tools is quick and easy. Tools like YSlow provide expert advice about why a web page such as this takes so long to load. It estimates the page load time from the various HTTP objects contained in the web page document that the browser constructs during page composition. This composition is performed within the web client in response to the specific instructions that determine page layout. These instructions are encoded in html, style sheets, cookies, image files and scripts. Web browsers perform this page composition and rendering based on the contents of the DOM, the Document Object Model, which is assembled from static elements identified in html and dynamic modifications to the DOM that occur when associated script code executes. Rather than worry about all HTTP elements in the DOM, YSlow is mainly concerned with analyzing the files referenced in the HTML, since each file referenced requires the web client to issue a GET Request to a web server and await the Response message in which the file’s contents are returned.

To illustrate HTML references to external files, see the following snippet of html that I pulled from an page devoted to customer reviews of one of the products that is for sale on the Amazon commercial site:

<link rel=”stylesheet” href=”” />

The HTML markup that impacts the DOM is the link statement that references a .css style sheet file that the page needs. The web browser will attempt to resolve the link statement by issuing a GET Request for the URL indicated in the href (short for “HTML reference”) tag. The URL references an auxiliary Amazon web site named http://images-amazon.comwhere this particular style sheet file can be found. HTML references to external files are expensive. The web browser first has to locate the IP address of this web server using DNS. A GET Request is then issued for the object using the proper IP address referencing either TCP Port 80 or 441, which are the TCP Port address associated with the HTTP and HTTPS protocols, respectively. Prior to sending the GET Request, the web browser must first establish a TCP session with that web server. And, when the GET Request is fin ally sent, the browser must await the Response message. Once the URL is resolved and the .css style sheet file referenced is returned in a Response message, the web browser will use the style sheet tags to format the any elements in the DOM that the style sheet applies to when the page is ultimately rendered.

If that seems like a good deal of effort is involved in web page composition, it is because there is.

The key insight baked into the YSlow performance rules is that the processing time inside the web browser to apply the style sheet is probably trivial compared to the time it takes to resolve the URL over the network and retrieve the file using the HTTP protocol.

YSlow itself was based on the work of Steve Souders, who was originally at Yahoo, but currently hangs his hat at Google. Souders is the author of a popular book on web application performance called High Performance Web Sites, which explains in some detail the rationale behind the YSlow tool. YSlow inspired the PageSpeed Insights tool that is currently available for Google’s Chrome web client, highly esteemed among web developers, and the performance-oriented Developer Tools in Microsoft’s Internet Explorer. YSlow also influenced the development of other, similar tools, including the performance testing site and Visual Round Trip Analyzer, for example, that was developed by a team responsible for web application performance for Microsoft web properties like HotMail.

In order to use YSlow, you have to first install the YSlow extensions into your browser. (YSlow supports Chrome, Safari, Firefox, and Opera, among others.) Then, on command, YSlow re-loads your web page and interrogates the DOM. It identifies each component of the DOM that was loaded by the page, determines its size, the contents of its headers, and other characteristics that can affect page load time performance. YSlow then generates a report that analyzes the page and provides guidance for reducing the amount of the time it would take to load the page.

Note that YSlow does not actually measure the time it takes to re-load the page it is analyzing. This is mainly due to the fact that caching of the page’s content by the browser and elsewhere on the network – caching is a ubiquitous feature of web technologies – improves the time to reload the page’s content significantly. This is a crucial point that we will revisit when we look at other web performance tools that do actually try to measure web application response time from the point of view of the web client application. It is in that context that I will also review the relatively, recent standardization effort backed by the W3C, the consortium that develops the standards web applications must adhere to, to incorporate performance-oriented timing data into the DOM where it can be gathered in a consistent fashion using JavaScript code.

But, meanwhile, back to YSlow. With the information in hand that it gathered about the components of the page that need to be loaded, YSlow then applies a number of performance Rules and calculates a grade for each rule, where “A” is excellent and “E” or “F” are failing grades. YSlow’s evaluation of the web page from our case study is shown in Figure 2.


Figure 2. The report YSlow generates when the tool reloads the example web page shown in Figure 1.

We see in Figure 2 that the web page being analyzed receives a near failing grade of “E” for the first and foremost of the YSlow performance rules, which is to make fewer HTTP requests.

To understand why this performance rule is so important for web application performance, it will help to dive deeper into the HTML protocol that is used in web page composition. At this point in the discussion, it will also be helpful to derive the scalability model for web application performance that is implicit in YSlow and similar performance tools.

I will take up those topics in more detail in the next blog posts in this series.