This is a continuation of a series of blog entries on this topic. The series starts here.
In this blog entry, I start to dive a little deeper into the model of web page response time that is implicit in the YSlow approach to web application performance.
The simple picture in Figure 3 (left) leaves out many of the important details of the web protocols, including the manner in which the web server that can respond to the GET Request is located using DNS, the Internet Protocol’s (IP) Domain Naming Service. But it will suffice to frame a definition of web application response time, which is measured from (1) the time the GET Request was issued by the browser, includes the time it takes to locate the web server, (2) the time it takes the web server to create the Response message in reply, the network transmission time to send these messages back and forth, and, finally, (3) for the web browser to render the Response message appropriately on the display. The response time for this Request is measured from the time of the initial GET Request to the time the browser’s display of the Response is complete such that the customer can then interact with any of the controls (buttons, menus, hyperlinks, etc.) that are rendered on the page. This response time measurement is also known as Page Load time. The YSlow tool incorporates a set of rule-based calculations that are intended to assist the web developer in reducing page load time.
Elsewhere on this blog, I have expounded at some length on the limitations of the ask-the-expert, rule-based approach, but there is no doubting its appeal. It makes perfect sense that someone who encounters a performance problem and is a relative newcomer to web application development would seek advice from someone with much more experience. However, when this expertise is encapsulated in static rules, and these rules are applied mechanically, there is often too much opportunity for nuance in the expert’s application of the rule to be missed. The point I labored to make in that earlier series of blog posts was that, too often, the mechanical application of an expert-based rule does not capture some aspect of its context that is crucial to its application. This is why in the field of Artificial Intelligence the mechanical application of rules in so-called expert systems gave way to the current machine learning approach that trains the decision-making apparatus based on actual examples. On the other hand, understanding how and why an expert formulated a particularly handy performance rule is often quite helpful. That is how human experts train other humans to become expert practitioners themselves.
In essence, as Figure 3 illustrates, HTTP is a means to locate and send files around the Internet. These files contain static content, structured using the HTML markup language so that they contain the instructions the web client needs to compose and render them. However, many web applications generate Response messages dynamically, which is the case with the specific charting application we are discussing here in the case study. In that example, the Response messages are generated dynamically by an ASP.NET server-side application based on the machine, date, and chart template selected, which are all passed as parameters appended to the original GET Request message sent to the web server to request a specific database query to be executed.
As depicted in Figure 4, the HTTP protocol is layered on top of TCP, which requires a session-oriented connection, but HTTP itself is a sessionless protocol. Being sessionless means that web servers process each GET Request independently, without regard to history. In practice, however, more complex web applications, especially ones given to generating dynamic HTML, are very often session-oriented, utilizing parameters appended to the GET Request message, cookies, hidden fields, and other techniques to encapsulate the session state, reflecting past interactions with a customers and to link current Requests with that customer’s history. A canonical example is session-oriented data to associate a current GET Request from a customer with that customer’s shopping cart filled with items to purchase from the current session or retained from a previous session.
Working our way down the networking stack, the TCP protocol sits atop IP and then (usually) Ethernet at the hardware-oriented, Media Access level. These lower level networking protocols transform network requests into datagrams and from there into packets to build the streams of bits that are actually transmitted using the networking hardware. For mainly historical reasons, the Ethernet protocol supports a maximum transmission unit (MTU) of approximately 1500 bytes. Note that each protocol layer in the networking stack inserts its addressing and control metadata into a succession of headers that are appended to the front of the data packet. After accounting for the protocol headers, the maximum capacity of the data payload is closer to about 1460 bytes. The Ethernet MTU requires that HTTP Request or Response messages that are larger than 1460 bytes be broken into multiple packets by the IP layer at the Sender. IP is also responsible for reassembling the packets at the Receiver. These details of the networking protocol stack are the basis in YSlow for the performance rules that analyze the size of the Request and Response messages that are used at the level of the HTTP protocol to compose the page.
A further complication is that many web pages are composed from multiple Response messages, as depicted in Figure 5. Typically, the HTML that is returned in the original Response message contains references to additional files. These can and often do include image files that the browser must display, style sheets for formatting, video and audio files that the browser may play, etc. In the web charting application I am using as an example here, the charts themselves are rendered on the server as .jpg image files. The HTML in the original Response message references these image files, which causes the browser to issue additional HTTP GET Requests to retrieve them during the process page composition. Of course, the server-side application builds a .jpg file for each of the two charts that are to be rendered when it builds the original Response message. In order to display presentation-quality charts, the jpg files that are built are rather hefty, given that they must be transferred over the network to the web client. The GET Requests to retrieve these charts fully rendered on the web server in jpg form generate a very large Response message that then requires multiple data packets to be built and transmitted.
So, composing web pages may not only require multiple GET Requests to be issued as a result of <link> tags, Response messages that are larger than the Ethernet MTU require transmission of multiple packets. The number of networking data transmission round trips, then, is a function of both the number of GET Requests and the size of the Response messages. For the sake of completeness, note that whenever the size of a GET Request exceeds the MTU, the GET Request must be broken into multiple packets, too. The most common reason that GET Requests exceed the Ethernet MTU is when large amounts of cookie data need to appended to the Request.
Both the number of files that are requested to render the page and the file size are factors in the YSlow performance rules. For example, the HTML returned in the original Response message generated by the ASP.NET application may reference external style sheets, which are files that contain layout and formatting instructions for the browser to use. Formatting instructions in style sheets can include what borders and margins to wrap around display elements, the size and shape of what fonts to use, what colors to display, etc. The example app does rely on several style sheets, but none of them are very extensive or very large. Still, each separate style sheet file requires a separate GET Request and Response message, and some of the style sheets embedded in the document are large enough to require multiple packets to be transmitted.
Take a minute to consider all those times that you encounter a web page where the page is partially rendered, but is incomplete and the UI is blocked. The web browser has its instructions to gather all the HTTP objects that the web page references, and you are not able to interact with the page until all the objects referenced in the DOM are resolved. But if the Response message for any one of those objects is delayed, the page is not ready for interaction. That is what Page Load time measures.