Webpagetest and measuring visual completeness — why is this web app running slowly, Part 8.

This is a continuation of a series of blog entries on this topic. The series starts here.

In this post, I discuss one more performance tool that belongs in the YSlow family. It is called WebPageTest, and it makes a credible attempt at addressing some of the limitations of the YSlow approach I have been writing about. The WebPageTest tool does measure Page Load time, and offers several other useful goodies, including a revised set of performance optimization rules to consider. WebPageTest builds on the legacy of YSlow, while also trying to address some of the limitations of the original approach with today’s JavaScript-enhanced web applications. 

Focusing as the original YSlow approach does on the web page composition process performed inside the browser, one of YSlow’s legacies is that it spawned the development of web application performance tools that measure the page load time, addressing one of the important limitations of the original YSlow approach. Browser-based tools like Chrome’s PageSpeed do measure page load time and visualize the page composition process as a waterfall diagram unfolding over time. The browser-based approach still faces the limitation that it can only measure page load time from a single vantage point on the network, which is hardly representative of actual network latencies. 

Another limitation of the browser-based approach is that the traditionally Page Load time measurement, which is signaled by the firing of the DOM’s window.load event, may no longer represent a clear, unambiguous boundary for the response time event. As web pages perform more and more JavaScript manipulation of the DOM during execution of the window.load event handler, this boundary can be blurred beyond recognition in extreme cases. It is another case of evolving web technologies requiring new measurement approaches.

Figure 10 is a screen shot from a WebPageTest timing test that I performed on the Amazon Home page. There is so much going on in that rich display of performance data – partly because WebPageTest provides so much information about page load time and partly because the Amazon home page itself is so complex – that a single screen shot hardly does justice to the tool. You can, however, access this example directly using the following hyperlink to the performance test results for the Amazon page: http://www.webpagetest.org/result/140912_6Y_R46/1/details/. Better yet, you can sign up to use the site, which is free of charge, and run the performance test against your own web site.

Figure 10. The waterfall view of the page composition process in the www.webpagetest.org tool, which measures the performance of the Amazon.com home page in this example.

Across the top of the Test Result page, the WebPageTest detail report proudly shows its YSlow heritage by computing a “PageSpeed” score and a set of letter grades, similar to YSlow, based on the web page conforming to a set of mostly familiar performance rules, including the usual recommendations to minify Http objects and compress Response messages, enable caching, and defer parsing of JavaScript, etc. You may also notice some distinctly new rules reflecting current trends in more complex web page composition, like prioritizing visible “above the fold” content. The “above-the-fold” terminology is borrowed from grahic design principles of printed (and folded) newspaper layout, where the goal, back in the heyday of the printed press, was to capture Readers’ eyeballs from the content visible on the front page above the fold. The web page load focuses on content that first becomes visible without recourse to scrolling, ignoring for a moment the complication that while the fold was a physical artifact of the manner in which printed newspapers are displayed at a newsstand, the extent of the web page content that is first visible in a browser window is a function of both the physical display size, which is permanent, and the current size of the browser window, which is subject to change.

In addition, WebPageTest attempts to provide a set of more representative and realistic network latencies to measure by being able to host your test at a range of different locations scattered around the globe. After selecting one of the available locations for the performance test, WebPageTest builds a waterfall view of the page composition process that is similar to PageSpeed, as illustrated in Figure 10, where I again used the Amazon.com home page for an example. The WebPageTest view even incorporates YSlow-like performance rules and letter grades. In this example, the Amazon web page initially took about 5 seconds to load, but then continued to load content for another 6 seconds or so after the onload event fired and the window.load event handler began to execute. 

In an attempt to address the second measurement issue, WebPageTest considers the web page to be visually complete after 13.7 seconds, a measurement that is based on comparing successive video images until a final, static video is captured. The “visually complete” measurement is intended to augment page load time measurements, particularly useful when, as in this example, the window.load event handler performs extensive and time-consuming DOM manipulation.

Another innovative aspect of the WebPageTest toolset that is innovative is its UI enhancements to the waterfall diagram used to show the page composition process over time. Figure 10 illustrates how the delay associated with each GET Request is decomposed into the consecutive stages required to resolve the Request. If any DNS lookup is required, TCP protocol connection hand-shaking, or SSL negotiation, these delays are indicated visually. In addition, the Time to First Byte is also distinguished visually from any subsequent packets that may have been required to send the full Response message. (These are distinguished in the tool as a delay associated with “content download.”) 

Visually, the WebPageTest also adds an overlay line to indicate in the waterfall diagram when the window.load event fired, which gives you a way to assess how much JavaScript processing of the DOM occurs after the event that signals that the Page is loaded fires. If major DOM manipulation is deferred to the window.load event handler, the WebPageTest waterfall diagram reflects that, in effect, directing you to weigh using the Visually Complete measurement instead of Page Load as the better available performance indicator.

 

Next: Real User Measurements using the NavTiming API.

Analyzing HTTP network traffic: Why is this web app running slowly, Part 7.

This is a continuation of a series of blog entries on this topic. The series starts here.

Since HTTP is a wire protocol built on top of TCP/IP, the network packet sniffer technology that is widely used in network diagnostics and performance optimization is readily adapted to measuring web browser Page Load time. Network sniffers like WireShark can intercept and capture all the HTTP traffic and are typically configured to gather related network events, such as DNS look-ups. It is easy to get overwhelmed with all the information that these network diagnostic tools provide. Often software developers prefer network tools that are more focused on the HTTP protocol and the page composition process associated with assembling the DOM and rendering it in the browser. The Developer Tools that ship with the major web browsers include performance tools that measure Page Load time and help you diagnose why your page is slow to load. These tools work by analyzing the network packets sent and received in reply by the web client.

Many web application developers prefer the developer tools for diagnosing performance problems in web applications that can be found in Google’s Chrome, even though many authorities think recent versions of Microsoft’s Internet Explorer are keeping pace. Now that Steve Souder is working in the team that develops the Chrome developer tools, Chrome has also added a tool called PageSpeed, which is functionally very similar to the original YSlow application. If you are using Chrome as a browser, you can navigate to the Developer tools by clicking the “Customize and control Chrome” button on the right hand edge of the Chrome menu bar. Then select the “Tools” menu and then click “Developer Tools.” PageSpeed is one of the menu options on Chrome’s Developer Tools menu.

Let’s take a look at both PageSpeed and the network-oriented performance tool from the suite of Developer Tools that come with Chrome, using the US Amazon home at www.amazon.com page as an example. We will also get to see the degree to which Amazon.com, the most popular e-commerce site in the US (according to http://www.httparchive.org/viewsite.php?u=http%3A//www.amazon.com/&pageid=16085988#waterfall, embraces the YSLow performance rules.

Figure 6 shows a view of the Chrome PageSpeed tool after I requested an analysis of the Amazon.com home page. The first observation is that PageSpeed does not espouse the simple YSlow prime directive to “Make fewer HTTP requests.” This is a huge change in philosophy since Souder originally developed YSlow, of course, reflecting some of the concerns I mentioned earlier with how well the original YSlow scalability model reflects the reality of modern web applications.

While the rule set might vary a bit, reflecting Souder’s wider and deeper experience in web application performance, but otherwise PageSpeed is identical in operation to YSlow. The Chrome version of the tool re-loads your page and inventories the DOM after the page has been re-built. In the example PageSpeed screenshot, I focused on one of the important and tuningPageSpeed rules that has an identical counterpart in YSlow, namely “minimize the request size”. PageSpeed improves upon the usability of YSlow by clearly identifying the HTTP GET requests that triggered the rule violation. Here PageSpeed reports the 4 requests that generated Response messages that exceeded 1500 bytes and thus required more than one network packet from the web site in response.

You may also notice that the four Requests shown that violate the “minimum request size” rule are directed at three different web sites, one for a resource available from the main amazon.com web, two from fls.na.amazon.com (likely an internal CDN with responsibility for serving files to customers living in North America), as well as an advertisement served up from doubleclick.net, a company that serves up display ads that is owned by Google. It is common practice for web pages to be composed from content that needs to be assembled from multiple sources, including the advertising that is served up by third party businesses like DoubleClick. While it is fashionable to assert that “information wants to be free,” the harsh reality is that developing and maintaining the enormously complex hardware and software environments that power Internet-based web applications is extremely expensive, and advertising revenue is the fuel that sustains those operations. Advertising revenue from web page viewing (and clicking) has also made Google one of the two or three most profitable Tech industry companies in the world.

PageSpeed-2BInsights-2Bfor-2BChrom-2Bscreen-2Bshot-2Bfor-2Bamazon-2Bhome-2Bpage

Figure 6. Using the Chrome PageSpeed tool to analyze the Amazon.com home page. PageSpeed offers recommendations similar to YSlow, but the rule about making fewer HTTP Requests has been dropped.

For a glimpse at a performance tool that does report Page Load time measurements directly, click on the “Network” tab on the Chrome Developer tool menu bar, which is shown in Figure 7. The Network view of Page Load time contains an entry for each GET Request that was performed in order to render the full Amazon.com home page and shows the time to complete each operation. You will notice at the bottom of the window, Chrome shows that a total of 315 GET Requests were issued for various image files, style sheets, and JavaScript code files in order to render the Amazon Home page. In this instance, with effective use of the cache to render the Amazon Home page, the browser only took about 4.3 seconds to complete the operation. The overall Page Load time is displayed at the lower left of the window border. (When the browser cache is “cold,” loading the Amazon Home page can easily take one minute or more.)

The Timeline column at far right presents the web page composition process in time sequence, a view that has become known as a waterfall diagram. The Chrome waterfall diagram for Page Load time features a pop-up window that breaks out the time it took to load each individual component of the page. We can see that the initial GET Request to www.amazon.com returns a Response message that is about 78 KB, a payload that has to be broken into more than 50 individual packets. In the pop-up window, we see that the browser waited for 142 milliseconds before the first packet of the HTTP Response message appeared. It then took 1.55 seconds for the remaining 50 or so packets associated with that one Response message to be received. These are measurements derived from monitoring the network traffic that HTTP GET Requests and Response messages initiate.

Chrome-2BDeveloper-2Btools-2Bnetwork-2Bview-2Bahowing-2Bwaterfall-2Bdiagram

Figure 7. The Chrome PageSpeed waterfall diagram for the page composition process for the amazon.com home page.

The initial HTTP Response message from the Amazon Home page serves as a kind of exoskeleton that the browser gradually fills in from the Response messages generated by the remaining 314 subsequent GET Requests that are referenced. The HTML standard permits page loading to proceed in parallel, and, as noted above, browsers can generate parallel TCP sessions for loading static content concurrently. In this instance, of course, many of the HTTP objects are available from the cache because I had just loaded the same page immediately prior to running the PageSpeed tool.

About ten lines down in the example, there is a GET Request for a 23.6 KB image file that required 164 ms to complete. The Timeline column pop-up that breaks out the component load time indicates a separate DNS lookup that took 36 ms and TCP session connection sequence that took 25 ms. This means an embedded Request for a URL that was directed to a different Amazon site. The browser then waited 32 ms for the initial Response message. Finally, it shows a 27.5 ms delay spent in the Receiving state, since the 23.6 KB Response message would require multiple packets. Because the browser supports parallel TCP sessions, this Request does not prevent the browser from initiating other Requests concurrently, however. Page composition that requires content stored on multiple web sites, parallel browser sessions, JavaScript blocking, and the potential for extensive DOM manipulation when the JavaScript executes are features of web applications that complicate the simple YSlow scalability model derived earlier. Incorporating these additional features into the model yields a more realistic formula:

Page Load time = Browser Render time + Script execution time +

((Round trips * RTT)/Sessions)

[equation 5]

While equation 5 models the web page composition better, it is still not definitive. We have already discussing some of the added complications, including

  • RTT is not a constant factor when GET Requests to multiple web sites are issued to compose the complete page and when the impact of the browser cache and a CDN are factored into the equation,
  • JavaScript download requests cannot proceed in parallel, and
  • JavaScript code executing inside a handler attached to the DOM’s window.load event may further modify the DOM and effectively defer the usability of the page until the script completes, but none of that script execution time is included in the Page Load Time measurements.

Before wrapping up this discussion of the limitations of the YSlow scalability model, let’s return full circle to the original web application (illustrated back in Figure 1 in the very first post in this series) that was running slowly, motivating me to explore using a tool like YSlow to figure out why. Figure 8 shows a Chrome PageSpeed waterfall diagram for the page composition process for the data-rich ASP.NET web application in the case study. (Figure 9 tells a similar story using the comparable Internet Explorer performance tool.)

Chrome-2Bpage-2Bload-2Btime-2Bwaterfall-2Bdiagram-2Bfor-2BDailyCharts-2Baspx-2Bpage

Figure 8. Waterfall diagram from Chrome PageSpeed showing individual GET Requests and their delays, each of which contributes to the Page composition time. In this example for the case study app, the delay associated with resolving the initial ASP.NET GET Request accounts for about 75% of the overall page load time.

Chrome’s PageSpeed tool indicates that the DailyCharts.aspx web page took about 1.5 seconds to load, requiring a total of 20 GET Requests. These measurements were captured in a single system test environment where the web browser, IIS web server, and the SQL Server backend database were all available on the same machine, so that network latency was minimal. Crucially, generating on the web server a Response message from the original GET Request that runs to 1.5 MB and then transferring that Response message to the web client alone accounted for about ¾ of the overall web application’s Page Load Time. In addition, GET Requests for the two high resolution charts, which are rendered on the server as .jpg files and then referenced in the original Response message, yielded Response messages of about 275 KB and 100 KB in size.

Moreover, since Chrome PageSpeed re-executed the web page, the database queries that were executed on the web server benefitted substantially from a warm start in the SQL Server cache. When I subsequently investigated how long these database queries could take, I noted that a query like the one illustrated that interrogates voluminous Process level counter data stored in the repository accessed several hundred thousand rows of data, that were then subject to sorting to select a return a result set containing the five busiest processes during a 24 period. Without the benefit of a SQL Server cache warm start, database query execution time alone could be on the order of 15-30 seconds.

 

IE-2Bpage-2Bload-2Btime-2Bwaterfall-2Bdiagram-2Bfor-2BDailyCharts-2Baspx-2Bpage

Figure 9. A similar view of Page Load time using Internet Explorer’s developer tools. In this instance, I also installed an ASP.NET oriented performance tool called glimpse in the web site in order to help diagnose the performance issues. The additional requests associated with Glimpse delay page load time by another 150 ms.

Figure 9 represents another view of the page composition process, this time using Internet Explorer’s version of the waterfall timing diagram, which again shows a 1.44 MB Response message generated by the ASP.NET app in response to the initial ASP.NET web GET Request. Internet Explorer reports that it required 1.38 seconds to generate this Response message and transmit it to the web client. (Note that that in this test environment, the web client, the IIS server and the SQL Server back-end database all reside on the same machine, so network latency is minimal – I measured it at less than 10 microseconds.)

The initial GET Request Response message contains href links to the high resolution charts that were rendered on the web server as .jpgs. Resolving these links for a 235 KB main chart and a 84 KB secondary chart also impact page load time, but these file requests are able to proceed in parallel, at least.

In both Figure 8 and 9, resolving the initial ASP.NET GET Request clearly dominates the page composition process. These waterfall views of the web page composition process for this web application place the YSlow recommendations for improving page load time performance, illustrated back in Figure 2, in a radically different perspective. Instead of worrying about making fewer Http GET Requests, I needed to focus on why the server-side processing to generate a Response message was taking so long. In addition, I also wanted to understand why the Response message associated with the GET Request was so large, requiring almost 1.5 MB of content to be transferred from the web server to the web client, and what steps could be taken to trim it down in size. Unfortunately, the web application performance tools like YSlow are basically silent on the subject of the scalability of any of your server-side components. These need to be investigated using performance tools that run on the web server.

Ultimately, for the case study, I instrumented the ASP.NET application using Scenario.Begin() and Scenario.End() method calls, which allowed me to measure how much time was being spent calling the back-end database to resolve the GET Request query. I wound up re-writing the SQL to generate the same result set using a fraction of the time. Since the database access logic was isolated in a Business Objects layer, that was a relatively straightforward fix that I was able to slip into the next maintenance release. But that quick fix still left me wondering why the initial Response messages were so large, which I investigated by using the ASP.NET page trace diagnostics facility to examine the amount of ViewState data being passed to build the various Machine and Chart selection menus. One of the menu items referenced “All Machines” for every Windows machine defined in the repository, which was a red flag right there. Addressing those aspects of the server-side application required a significant re-engineering of the application, however, which was work that I completed last year.

To conclude my discussion of the case study app, I found that the YSlow-oriented performance tools did highlight the need for me to understand why the server-side processing associated with generating the initial Response message was taking so long and also spurred me to investigate why such large Response messages were being generated. The specific grades received from applying the YSlow performance rules to DOM were not particularly helpful, however. To resolve the performance issues that I found required using traditional server-side application performance tools, including the SQL Explain facility for understanding the database queries and the ASP.NET diagnostic trace that showed me the bloated contents of the ViewState data that is transmitted with the page to persist HTML controls between post back requests. (A post back request is a subsequent GET Request to the web server for the same dynamic HTML web page.) It turned that much of the ViewState data embedded in the initial Response message generated by the the ASP.NET app was supporting the page’s menu controls.)

In practice, any web application that generates dynamic HTML that requires ample amounts of server-side resources to build those Response messages faces scalability issues with its server-side components – the database back-end, the business logic layer, or the web server front end. It should be evident from the discussion so far that performance tools like YSlow that execute on the web client and focus on the page composition process associated with the DOM are silent on any scalability concerns that may arise on the web server.

.

Measuring Web Page Load time: why is this web app running slowly, Part 6.

This is a continuation of a series of blog entries on this topic. The series starts here.

In this post, I will discuss three approaches to measuring actual web page load times, something which is quite important for a variety of reasons, some of which I have already discussed. Measurements of web page load time capture service level measurements from the standpoint of the customer. Service level measurements also enable performance analysts to use decomposition techniques, breaking down page load time into its components: browser render time, network transmission delays, IP domain name lookup, TCP session connection, etc.

The first approach to measuring web page load times was built on top of network packet capture technology, which was already capable of capturing the network packets associated with HTTP GET and POST Requests and their associated Response Messages. Packet tracing is associated with network sniffers like WireShark and Netmon that play a huge role in the data center in diagnosing network connectivity and performance problems. By adding the ability to understand and interpret requests made using the application-level HTTP protocol, packet-tracing tools like WireShark could be extended to report Http application-oriented response time measurements.

 One obvious limitation of this approach is that the network capture needs to occur somewhere inside the data center, and that data-center orientation is limited whenever web applications reference content that is outside the data center, including content that is cached in a CDN or references a third-party ad server. A second limitation is that network packet tracing captures a huge volume of trace data, which then must be filtered extensively down to just the relevant HTTP traffic. A third limitation is the ability of the browser to open and transmit requests on multiple TCP sessions, a capability that makes it more difficult to stitch together all the HTTP Requests associated with a single page load into a coherent view of the web application. These limitations are not serious ones when measurement tools based on network packet tracing are used in the development and testing, but they are serious constraints when you need to use them to monitor a large scale, production environment.

A second approach was pioneered by vendors who saw an opportunity to address the limitations of network packet tracing by measuring web application response times at their source, namely, from the point of view of the web client. Measurements that are taken at the web client are known as end-to-end (often abbreviated as ETE) response time measurements. A simple, low-tech way to go about measuring web application response times from the vantage point of the web client is to gather them manually, using a stopwatch, for example, to mark the Request begin and end times. Now if you can imagine a hardware and software solution that can automate the measurement process, you have the necessary ingredients for building an end-to-end measurement tool. Such a solution would simulate customer activity by generating synthetic requests issuing from the vendor’s data centers to your web site and measuring the end-to-end response times that resulted – a form of automated performance testing. In the process, these performance tools can also assess web site availability, which would include notification in the event of an outage.

The vendors who built the first end-to-end monitoring solutions moved quickly to extend their monitoring operations to use multiple sites, distributing these monitoring locations around the globe in order to incorporate more representative network latencies in the end-to-end measurements they could gather. Once issuing web application requests from locations is factored in the equation, measuring end-to-end response of these synthetic requests gains the advantage that it incorporates network latencies that are more representative of actual customer experiences, compared to performance tests that are performed from inside the data center. The vendors’ synthetic testing package typically offers service level reporting and exception reports detailing requests that did not meet their service level goals for both availability and responsiveness.

An obvious concern when you are relying on this approach is that the synthetic workload must be representative of the actual workload in order for the measurements that are captured to be useful, the exact same issue that performance engineers who design and run automated web site stress tests also struggle to address. There is also the related problem of what experienced QA professionals call test coverage, where the range of synthetic requests issued does not encompass enough of the surface area of the application, keeping the data center in the dark when too many important “edge cases” remain un-instrumented, while the developers remain just as much in the dark as ever about which specific scenarios lead to long-running requests.

The third and most recent approach gathers measurement data on Page Load time from inside the web browser. This approach is known as Real User Measurements, or RUM, to distinguish it from the synthetic request approach. With RUM, you are assured of complete coverage since all customer requests can be measured, keeping in mind that “all” can be a very large number. The RUM approach also faces some substantial technical hurdles. One serious technical issue is how to get measurement data from the web browser session on a customer’s remote computer or mobile device somewhere in the world back to the data center for analysis. In the Google Analytics approach to RUM, the measurements taken by the browser are sent to a ginormous Google data center using web beacons where the measurements are analyzed and reported.

Another obstacle in the RUM approach is the volume of measurement data that can result. While the amount of RUM measurement data you need to collect is far less that the volume of network packet trace records that must be sifted through to understand HTTP application response times, it is still potentially quite a large amount of measurement data, given an active web site. Sampling is one alternative for high volume web sites to consider. By default, Google Analytics samples the response time measurements, which helps considerably with the volume of network traffic and back-end processing. And, since Google provides the resources at its data center to process all the web beacon measurements, it is Google shouldering that burden, not your data center, which assumes your organization approves of Google having access to all this measurement data about your web site to begin with. Naturally, third party vendors like New Relic have moved into this space where they gather and analyze this measurement data for you and report back directly to you, guaranteeing that this web site tracking data will never reach unfriendly eyes.

A final aspect of the RUM approach that I will want to focus some attention on is the Navigation/Timing standard that was adopted by the World Wide Web Consortium (W3C), the standards body responsible for the HTTP protocol. The Navigation/Timing specification that the major web browsers have all adopted provides a standard method for gathering RUM measurements independent of what web browser or device your web site visitor is using. Prior to the Navigation/Timing API, gathering RUM measurements was a little complicated because of differences among the individual web browsers. However, as the Navigation/Timing API was adopted by the major web browsers, eliminating most of the complications involved in gathering RUM data from your customers’ sessions with your web site.

In the next post, I will drill deeper into some of the performance tools that take the first approach, namely, measuring web application performance by analyzing HTTP network traffic.

.