Why is my web app running slowly? — Part 1.

This series of blog posts picks up on a topic I made mention of earlier, namely scalability models, where I wrote about how implicit models of application scalability often impact the kinds of performance tests that are devised to evaluate the performance of an application. As discussed in that earlier blog post, sometimes the influence of the underlying scalability model is subtle, often because the scalability model itself is implicit. In the context of performance testing, my experience is that it can be very useful to render the application’s performance and scalability model explicitly. At the very least, making your assumptions explicit opens them to scrutiny, allowing questions to be asked about their validity, for example.

The example I used in that earlier discussion was the scalability model implicit when employing stress test tools like HP LoadRunner and Soasta CloudTest against a web-based application. Load testing by successively increasing the arrival rate of customer requests assumes there is a relationship between the response time (RT) for web requests and the number of concurrent requests, namely

RT = f(λ)

where λ represents the arrival rate of requests (the standard notation for the term in queuing theory). In the context of a stress test this implicit scalability model is often correct – drive the request rate high enough and you are apt to drive some hardware or software component to saturation, at which point queuing delays will start to have an impact on web response time.

Of course, the rate customer requests arrive to be processed represents only one dimension of a web application’s scalability profile. Experienced performance testers understand that there are other factors that influence performance. For instance, in the crucial order processing portion of the application, the size of the order, the number of items in the customer’s shopping cart that need to be processed, etc., can also be very important scalability factors. The point I tried to make in the earlier blog entry was that all the relevant dimensions that impact the application’s scalability need to be addressed in performance testing to assess the quality of the release effectively. Another way to think about this is that the application scalability model you formulate is a hypothesis that the performance and acceptance testing process is designed to test. All of which reinforces the notion that significant benefits can be derived from making the scalability assumptions that are implicit in performance testing explicit.

In another variation on this theme, I will focus in this series of blog posts on a particular model of web application performance that has proved extremely influential. This is something I call the YSlow model of web application performance, named after the YSlow performance tool, originally developed at Yahoo, associated with the work of Steve Souders. To begin, I will strive to make the scalability model implicit in web performance tools like YSlow explicit.

I will also discuss how the YSlow scalability model influenced the development of other web application performance tooling, culminating in the W3C specification of a navigation and timing API that provides access from JavaScript to web application performance measurements. The W3C spec for the web client navigation and timing API is currently imbedded in all the major web clients, including Chrome, Mozilla, Foxfire and Internet Explorer. I will drill into the W3C navigation and timing APIs to demonstrate how to gather and utilize these performance measurements, or Real User Measurements (RUM), as they have become known. The navigation and timing API is a great help to anyone with a need to understand the end-to-end web application response time experience of actual, real-life web site customers. I expect the navigation and timing API to spawn a whole new generation of web application performance tools that will exploit this valuable measurement data.

In addition, I want to cast a critical eye on the YSlow model of web application performance and highlight some areas where the reality of web application performance can depart from expectations raised by the model. There are some areas where the YSlow model is just a little too simple for the burgeoning complexity of networked-enabled applications developed for the web, the cloud, or both. Using an example of a data-rich ASP.NET application that requires extensive processing at the web server and the back-end database to generate Response messages, I will try to show what additional measurements may be required to solve performance and scalability issues that transcend the diagnostic capabilities of YSlow and similar tools.

Why is this web app running slowly?

To understand what a web application performance tool like YSlow does, it will help to be able to refer to a concrete example. Accordingly, I will discuss running YSlow against a web application that was perceived as running slowly. The application in question is also one that I care about. Figure 1 shows a screen shot of that application in its “before” stage when it exhibited serious performance and scalability problems.


Figure 1. A screen shot of an ASP.NET web application that runs slowly.

The app is a graphical reporting application devoted to visualizing performance measurements which exist in the form of time series data. It is a web front end to a back-end SQL Server-based repository of performance data that my software company provides to its customers. For purposes of this discussion, its most salient characteristic is it is a data-rich query application, which then renders the results in high resolution charts using Microsoft’s ASP.NET server-side technology. It relies specifically on the web forms Chart facility in the .NET Framework to generate presentation-quality charts and graphs, creating two such graphic images per query, as illustrated. Relatively large jpeg images of charts are generated on the server based on the result set of the designated query. These jpeg files are then transmitted to the web form over the network. In an application development environment where I was seeking to understand why it was running so slowly at times, the performance issues that were evident were quite convincingly not due to networking performance, since the web client, IIS web server, and back-end database all resided on the same (physical) Windows machine.

Other relevant features of the example application that was the subject of a performance investigation proved to include the following:

  • The queries to generate the charts are defined using a set of additional web forms to create a re-usable template for the report. These chart definition templates are also stored in the same SQL Server database where the performance data lives, allowing the queries be re-executed in subsequent interactions and sessions.
  • Dynamic elements of the database queries that are resolved at run-time include a calendar control for date selection and menus for selecting the machine or machines of interest.

Using YSlow.

Whenever you have reports of a web application that is running slowing, looking for answers from YSlow or similar performance tools is quick and easy. Tools like YSlow provide expert advice about why a web page such as this takes so long to load. It estimates the page load time from the various HTTP objects contained in the web page document that the browser constructs during page composition. This composition is performed within the web client in response to the specific instructions that determine page layout. These instructions are encoded in html, style sheets, cookies, image files and scripts. Web browsers perform this page composition and rendering based on the contents of the DOM, the Document Object Model, which is assembled from static elements identified in html and dynamic modifications to the DOM that occur when associated script code executes. Rather than worry about all HTTP elements in the DOM, YSlow is mainly concerned with analyzing the files referenced in the HTML, since each file referenced requires the web client to issue a GET Request to a web server and await the Response message in which the file’s contents are returned.

To illustrate HTML references to external files, see the following snippet of html that I pulled from an Amazon.com page devoted to customer reviews of one of the products that is for sale on the Amazon commercial site:

<link rel=”stylesheet” href=”http://z-ecx.images-amazon.com/images/G/01/AUIClients/NavAuiAssets-7a37ff9cf24ac9bd811dc5cbdec16d08155faabc.us.min._V2_.css” />

The HTML markup that impacts the DOM is the link statement that references a .css style sheet file that the page needs. The web browser will attempt to resolve the link statement by issuing a GET Request for the URL indicated in the href (short for “HTML reference”) tag. The URL references an auxiliary Amazon web site named http://images-amazon.comwhere this particular style sheet file can be found. HTML references to external files are expensive. The web browser first has to locate the IP address of this web server using DNS. A GET Request is then issued for the object using the proper IP address referencing either TCP Port 80 or 441, which are the TCP Port address associated with the HTTP and HTTPS protocols, respectively. Prior to sending the GET Request, the web browser must first establish a TCP session with that web server. And, when the GET Request is fin ally sent, the browser must await the Response message. Once the URL is resolved and the .css style sheet file referenced is returned in a Response message, the web browser will use the style sheet tags to format the any elements in the DOM that the style sheet applies to when the page is ultimately rendered.

If that seems like a good deal of effort is involved in web page composition, it is because there is.

The key insight baked into the YSlow performance rules is that the processing time inside the web browser to apply the style sheet is probably trivial compared to the time it takes to resolve the URL over the network and retrieve the file using the HTTP protocol.

YSlow itself was based on the work of Steve Souders, who was originally at Yahoo, but currently hangs his hat at Google. Souders is the author of a popular book on web application performance called High Performance Web Sites, which explains in some detail the rationale behind the YSlow tool. YSlow inspired the PageSpeed Insights tool that is currently available for Google’s Chrome web client, highly esteemed among web developers, and the performance-oriented Developer Tools in Microsoft’s Internet Explorer. YSlow also influenced the development of other, similar tools, including the WebPagetest.com performance testing site and Visual Round Trip Analyzer, for example, that was developed by a team responsible for web application performance for Microsoft web properties like HotMail.

In order to use YSlow, you have to first install the YSlow extensions into your browser. (YSlow supports Chrome, Safari, Firefox, and Opera, among others.) Then, on command, YSlow re-loads your web page and interrogates the DOM. It identifies each component of the DOM that was loaded by the page, determines its size, the contents of its headers, and other characteristics that can affect page load time performance. YSlow then generates a report that analyzes the page and provides guidance for reducing the amount of the time it would take to load the page.

Note that YSlow does not actually measure the time it takes to re-load the page it is analyzing. This is mainly due to the fact that caching of the page’s content by the browser and elsewhere on the network – caching is a ubiquitous feature of web technologies – improves the time to reload the page’s content significantly. This is a crucial point that we will revisit when we look at other web performance tools that do actually try to measure web application response time from the point of view of the web client application. It is in that context that I will also review the relatively, recent standardization effort backed by the W3C, the consortium that develops the standards web applications must adhere to, to incorporate performance-oriented timing data into the DOM where it can be gathered in a consistent fashion using JavaScript code.

But, meanwhile, back to YSlow. With the information in hand that it gathered about the components of the page that need to be loaded, YSlow then applies a number of performance Rules and calculates a grade for each rule, where “A” is excellent and “E” or “F” are failing grades. YSlow’s evaluation of the web page from our case study is shown in Figure 2.


Figure 2. The report YSlow generates when the tool reloads the example web page shown in Figure 1.

We see in Figure 2 that the web page being analyzed receives a near failing grade of “E” for the first and foremost of the YSlow performance rules, which is to make fewer HTTP requests.

To understand why this performance rule is so important for web application performance, it will help to dive deeper into the HTML protocol that is used in web page composition. At this point in the discussion, it will also be helpful to derive the scalability model for web application performance that is implicit in YSlow and similar performance tools.

I will take up those topics in more detail in the next blog posts in this series.


Page Load Time and the YSlow scalability model of web application performance

This is the first of a new series of blog posts where I intend to drill into an example of a scalability model that has been particularly influential. (I discussed the role of scalability models in performance engineering in an earlier blog post.) This is the model of web application performance encapsulated in the YSlow tool, originally developed at Yahoo by Steve Souders. The YSlow approach focuses on the time it takes for a web page to load and become ready to respond to user input, a measurement known a Page Load Time. Based on measurement data, the YSlow program makes specific recommendations to help minimize Page Load Time for a given HTML web page.

The conceptual model of web application performance underpinning the YSlow tool has influenced the way developers think about web application performance. YSlow and similar tools influenced directly by YSlow that attempt to measure Page Load Time are among the ones used most frequently in web application performance and tuning. Souders is also the author of an influential book on the subject called “High Performance Web Sites,” published by O’Reilly in 2007. Souders’ book is frequently the first place web application developers turn for guidance when they face a web application that is not performing well.

Page Load Time

Conceptually, Page Load Time is the delay between the time that an HTTP GET Request for a new web page is issued and the time that the browser was able to complete the task of rendering the page requested. It includes the network delay involved in sending the request, the processing time at the web server to generate an appropriate Response message, and the network transmission delays associated with sending the Response message. Meanwhile, the web browser client requires some amount of additional processing time to compose the document requested in the GET Request from the Response message, ultimately rendering it correctly in a browser window, as depicted in Figure 1.


Figure 1. Page Load Time is the delay between the time that an HTTP GET Request for a new web page is issued and the time that the browser was able to complete the task of rendering the page requested from the Response message received from the designated web server.


Note that, as used above, document is intended as a technical term that refers to the Document Object Model, or DOM, that specifies the proper form of an HTTP Response message such that the web browser client understands how to render the web page requested correctly. Collectively, IP, TCP and HTTP are the primary Internet networking protocols. Initially, the Internet protocols were designed to handle simple GET Requests for static documents. The HTML standard defined a page composition process where the web browser assembles all the elements needed to build the document for display on the screen, often requiring additional GET Requests to retrieve elements referenced in the original Response message. For instance, the initial Response message often contains references to additional resources – image files and style sheets, for example – that the web client must then request. And, as these Response messages are received, the browser integrates these added elements into the document being composed for display. Some of these additional Response messages may contain embedded requests for additional resources. The composition process proceeds ad infinitum until all the elements referenced are assembled.

Page Load Time measures the time from the original GET Request, all subsequent GET Requests required to compose the full page. It also encompasses the client-side processing of both the mark-up language and the style sheets by the browser’s layout engine to format the display in its final form, as illustrated in Figure 2.


Figure 2. Multiple GET Requests are frequently required to assemble all the elements of a page that are referenced in Response messages.

This conceptual model of web application response time, as depicted in Figure 2, suggests the need to minimize both the number and duration of web server requests, each of which requires messages to traverse the network between the web client and the web server. Minimizing this back and forth across the network will minimize Page Load Time:

Page Load Time » RoundTrips * Round Trip Time

The original YSlow tool does not attempt to measure Page Load Time directly. Instead, it attempts to assess the network impact of constructing a given web page by examining each element of the DOM after the page has been assembled and fully constructed. The YSlow tuning recommendations are based on a static analysis of the number and sizes of the objects that it found in the fully rendered page since the number of network round trips required can faithfully be estimated based on the size of the objects that are transmitted. For each Response object, then:

RoundTrips = (httpObjectSize / packet size) + 1

which is then summed over all the elements of the DOM that required GET Requests (including the original Response message).

Later, after Souders left Yahoo for Google, he was responsible for the construction of a tool similar to YSlow’s being incorporated into the Chrome developer tools that ship with Google’s web browser. The Google tool that corresponds to YSlow is called PageSpeed. (Meanwhile, the original YSlow tool continues to be available as a Chrome or IE plug-in.)

In the terminology made famous by Thomas S. Kuhn in his book “The Structure of Scientific Revolutions,” the YSlow model of web application performance generated a paradigm shift in the emerging field of web application performance, a paradigm that continues to hold sway today. The suggestion that developers focus on page load time gave prominence to a measurement of service time as perceived by the customer. Not only is page load time a measure of service time, there is ample evidence that it is highly correlated with customer satisfaction. (See, for example, Joshua Bixby’s 2012 slide presentation.)

While the YSlow paradigm is genuinely useful, it can also be abused. For instance, the YSlow rules and recommendations for improving the page load time of your application often need to be tempered by experience and judgment. (This is hardly unusual in rule-based approaches to computer performance, something I discussed in an earlier series of blog posts.) Suggestions to minify all your Javascript files or consolidate multiple scripts into one script file are often contraindicated by other important software engineering considerations. For example, it may be important to factor Javascript code into separate files when the scripts originate in different development teams and have very different revision and release cycles. Moreover, remember that YSlow does not measure page load time directly. Instead, it reasons about page load time based on the elements of the page that it discovers in the DOM and its knowledge of how the DOM is assembled.

Subsequently, other tools, including free tools like VRTA and Fiddler and commercial application performance monitoring tools like Opnet and DynaTrace, try a more direct approach to measuring Page Load Time. Many of these tools analyze the network traffic generated by HTTP requests. These network capture tools attempt to estimate Page Load Time based on the time the first HTTP GET Request generated a network packet that was transmitted by the client to the last packet sent by the web server in the last Response message associated with the initial GET. Network-oriented tools like Fiddler are easy for web developers to use and Fiddler, in particular, has many additional facilities, including ones that help in debugging web applications.

Over time, the Internet protocols began developing the capabilities associated with serving up content generated on the fly by applications. This entailed supporting the generation of dynamic HTML, where Response messages are constructed on demand by web applications, customized based on factors such as the identity of the person who issued the Request, where in the world the requestor was located when the Request was initiated, etc. With dynamic HTML requests, the still relatively simple process illustrated in Figure 2 potentially can grow considerably more complex. One of these developments included the use of Javascript code running inside the web browser to manipulate the DOM directly on the client, without the need to ever contact the web server, once the script itself was downloaded. Note that web application performance tools that rely solely on the analysis of the network traffic associated with HTTP requests cannot measure the amount of time spent inside the web browser executing Javascript code that is construct a web page dynamically.

However, developer-oriented timeline tools running inside the web browser client can gain access to the complete sequence of events associated with composing and rendering the DOM, including Javascript execution time. The developer-oriented performance tools in Chrome, which includes the PageSpeed tool, then influenced the design and development of similar web developer-oriented performance tools that Microsoft’s started to build into the Internet Explorer web browser. Running inside the web browser, the Chrome and IE performance tools have direct access to all the timing data associated with Page Load Time, including Javascript execution time.

A recent and very promising development in this area is a W3C spec that standardizes the web client timeline events associated with page load time, which also specifies an interface for accessing the performance timeline data from Javascript. Chrome, Internet Explorer, and webkit have all adopted this standard, paving the way for tool developers for the first time to gain access to complete and accurate page load time measurements in a consistent fashion across multiple web clients.

I will continue this discussion of YSlow and its progeny in a future blog post..