HTTP/2 and the TCP protocol

This post continues a discussion of the HTTP/2 protocol changes that began here.

The latest set of changes to the HTTP protocol provide support for multiplexing across a single TCP connection, with the goal of pushing as many bytes into the TCP Send Window as early as possible. This goal of HTTP/2 comes into conflict with standard TCP congestion control policies that are very conservative about overloading a connection. These congestion control mechanisms include slow start, the small initial size of the congestion window (cwin), and ramping up the size of the cwin slowly using additive increase, multiplicative decrease. These standard TCP policies may all need to be modified for web sites that want to take advantage of running the HTTP/2 protocol. In this section we will look briefly at the standard TCP behavior that conflicts with HTTP/2 web server multiplexing.

The basic congestion control strategy is for TCP to begin conservatively, widen the session’s congestion window gradually until it reaches the full size of the session’s advertised Receive Window, and includes backing off the transmission rate sharply whenever the Receive Window is full. The TCP congestion control mechanism defines a congestion window (or cwin) that overlays the Send Window. By default, the cwin is initially a single TCP segment. Using slow start, the cwin increases incrementally until TCP detects a congestion signal, the most frequent congestion signal being a Send Window full condition that causes TCP to pause to wait for an Acknowledgement packet from the Receiver. Upon detecting a congestion signal, standard TCP also immediately cuts the size of the cwin in half.

To understand why these standard TCP congestion control mechanisms conflict with the HTTP/2 goal of maximizing throughput over a single TCP connection, let’s return to the Facebook example GET Request that was described earlier. If you remember, the initial Response message that Facebook generates dynamically based on the identity of the Requester is quite large, approximately 550 KB. This initial Response message frames the web page and contains numerous links to many additional static files – JavaScript files, style sheets, various images, plus advertising content. Let’s look at how TCP on the web server transmits the initial Response message first, and then we will look into the page composition process that gathers all the additional content referenced in the original Response message.

As part of establishing the TCP connection, the web server and the web client negotiate an AdvertisedWindow, the name of the sixteen-bit field in the TCP header that is used to advertise a Receive Window size. The 16-bit AdvertisedWindow field can specify a maximum Receive Window of 64 KB. An additional scaling factor can be added to the TCP Options to increase the size of the sliding Receive Window up to I GB. (Note: Windows uses a dynamic approach to called Receive Window auto-tuning to optimize the size of the Receive Window based on measurement feedback and congestion signals.) To improve the page load time, the web server might attempt to negotiate a very large Send Window, but the web client is likely to reject a Send Window larger than 64 KB for the connection, which is the Windows default. So, let’s assume a negotiated value of 64 KB for the TCP AdvertisedWindow.

As part of the standard slow start mechanism in force at the beginning of the session, from the 550 KB Response message TCP initially sends one 1460 byte segment and then awaits the ACK from the client. The slow start mechanism then increments the congestion window by one segment, so TCP next sends two packets to the client and pauses to wait for the ACK. Then three packets, then four packets, etc.

Consider a connection with an RTT of 100 ms. TCP can send 1/0.1, or ten cwin-sized transmissions per second. If the size of the cwin increases by one for each cwin, then during the 1st second of the Response message transmission, TCP can only send 55 packets, or only about the first 80 KB of the full 550 KB message.

Windows provides several TCP options that can significantly improve the throughput of the initial Response message transmission, and a similar set of tuning options is available for Linux Apache web servers. These options are useful for improving throughput for any web site that frequently serves up large HTTP objects, but they are especially important in HTTP/2 because the protocol changes have the effect of boosting the potential throughput of each connection. The first is an option to increase the size of the initial cwin: InitialCongestionWindowMss. The second option is to change the IIS web server’s default CongestionProvider to the Compound TCP policy because Compound TCP tries to negotiate a larger Send Window and uses bigger increments to ramp up the cwin more aggressively.

For example, the following Windows Powershell command:

Set-NetTCPSetting –SettingName Custom –CongestionProvider CTCP –InitialCongestionWindowMss 16

sets the initial size of the cwin to 16 segments (about 23 KB) and switches to the Compound TCP congestion policy, which increases the size of the cwin faster than the normal policy. These are more aggressive settings than the very conservative TCP congestion control defaults, but are consistent with customers accessing your web site over mainly high-bandwidth broadband connections.

Another TCP option setting is especially important in HTTP/2 where the web client and web server communicate over a single TCP connection. By default, TCP uses Additive Increase/Multiplicative Decrease to reduce the size of the CongestionWindow and slow down the rate of data transmission when a congestion signal is detected. The most common congestion signal is a Send Window full condition when the Sender has filled the AdvertisedWindow with unacknowledged data and is forced to pause and wait for an ACK from the Receiver before it can send more data. As the name implies, Additive Increase/ Multiplicative Decrease cuts the size of the current CongestionWindow in half when a congestion signal is detected.

Returning to the monolithic Facebook web application for a moment, you may recall that the initial HTTP Response message referenced as many as 200 external resources, most of which are files consolidated on just two domains. With HTTP/2 multiplexing, the web browser establishes one session for each of those two domains and starts firing off GET Requests to them without pausing to wait for Response messages. On the web server side at a facility like Facebook, these GET Requests are processed in parallel on the massive web server back-end, like the one illustrated in Figure 3. A congestion signal that shrinks the size of the cwin on the single TCP connection that HTTP/2 clients and servers use to communicate has a major impact on potential throughput for that connection.

This is one of the places where a federated HHTP/1.x site often outperforms HTTP/2. To understand why, consider what happens when a congestion signal for one of the many parallel TCP connections under HTTP/1.x is detected. Instead of funneling the content requested through the front-end web proxy server that consolidates all the Response messages into a set of interleaved streams across a single connection, in HTTP/1.x it is a Best Practice to distribute content across multiple physical domains, which can be accessed using parallel sessions. Figure 4 illustrates the web client in HTTP/1.x opening up six parallel TCP connections in order to initiate six concurrent GET Requests to each of these sharded domains. Note that because HTTP/1.x is sessionless, the parallel connections can be handled independently by separate front-end proxy servers, which adds another element of parallelism to HTTP version 1 web server processing.

massively parallel web server domain in HTTP1.x

Figure 4. Under HTTP/1.x, which supports as many as six parallel sessions per domain, a congestion signal that reduces the size of the cwin in one connection has limited impact on the overall throughput of the active TCP sessions established between the web server and web client.

Figure 4 illustrates what happens when one of the 6 parallel TCP sessions detects a congestion signal that causes TCP to shrink the current size of the congestion window by 50%. Since the congestion signal only impacts one of the parallel TCP sessions, throughput to the web server, given the six concurrent sessions, is only reduced by 1/12 in the HTTP/1.x example illustrated in Figure 4.

The other congestion signal that causes TCP to shrink the size of the cwin is an unacknowledged packet that needs to be retransmitted. On the likely assumption that the unACKed packet is lost due to congestion encountered somewhere along the route to its destination, when TCP retransmits a packet, it not only reduces the cwin by 50%, it also reverts to slow start. Again, comparing HTTP/2 network traffic that relies on a single connection to HTTP/1.x that has multiple parallel connections, this congestion control policy reduces the overall throughput on the multiplexed HTTP/2 connection disproportionately. In Windows, setting the CwndRestart TCP option to True directs TCP to widen the cwin normally following a retransmit, instead of reverting to slow start.

Setting the CwndRestart option is important in HTTP/2, as a recent study of SPDY performance conducted by researchers at Lancaster University in the UK, headed by Yehia Elkhatib, showed. In the study, Elkhatib and his team systematically investigated the effect of bandwidth, latency and packet loss on SPDY performance, using a range of web applications that varied in size and complexity. On low latency links with significant packet loss – the type of connections many cell phone users get – Elkhatib’s research found that the HTTP/1.x protocol outperformed SPDY. The more significant issue for cell phone web access is that the type of complex web pages served up from monolithic web sites that HTTP/2 is optimized do not display well on portable devices with small screens. These portable devices are better served by one of the following strategies: redirecting the GET Requests to a mobile version of the site, responsive web development that queries the screen size and customizes the content accordingly, or native cell phone apps that communicate using web services. As noted earlier, the HTTP/2 multiplexing changes have no impact on web services.

Summary

In summary, HTTP/2 is the first change to the protocol used by all web applications in over 15 years. The major new features that HTTP/2 supports include multiplexing and server push. Multiplexing frees the web client from the serial sequence of Request:Response messages that the original sessionless HTTP protocol required. Along with server push technology, under HTTP/2 web servers acquire greater flexibility to interleave Response messages and maximize throughput across a single TCP connection. Multiplexing and server push should also eliminate the need to inline resources or to aggregate many smaller HTTP objects into one consolidated file in order to reduce the number of GET Requests required to compose the page, each of which should make web site administration simpler under HTTP/2 since domain sharding is no longer necessary. Interestingly, the new header compression feature, as well as some other aspects of the HTTP/2 changes, introduces additional session-oriented behavior into the protocol web applications utilize.

Based on performance testing with Google’s experimental SPDY protocol, which introduced similar web application protocol changes impacting both the browser and the web server, it appears that HTTP/2 will benefit monolithic web sites that generate large, complex pages. HTTP/2 multiplexing will have less of an impact on HTTP/1.x sites that are already configured to take advantage of the web client’s capability to establish up to six concurrent sessions with a single domain and download content in parallel. HTTP/1.x web sites that are configured today to maximize parallelism by spreading web page content across many domains, using either a federated model or domain sharding, may need to be re-configured to take better advantage of HTTP/2.

Finally, the standard TCP congestion control policies that are initially quite conservative about overloading a TCP connection come into conflict with the HTTP/2 goal of maximizing throughput across a single TCP connection. Web site administrators should consider setting TCP options that increase the initial size of the TCP congestion window and enable the Compound TCP congestion policy in Windows that increases the size of the cwin more rapidly. Another setting that allows TCP to more aggressive in the wake of a lost packet congestion signal should also be considered. These specific TCP performance options tend to improve throughput over high bandwidth, long latency connections, so they can also apply to HTTP/1.x sites that need to serve up large HTTP objects.

 .

HTTP/2 multiplexing.

This is a continuation of an article on the recent HTTP/2 protocol changes adopted that starts here.

At this point I want to drill deeper into the major features in the HTTP/2 revision and then try to evaluate their tangible impact on web application performance. The HTTP/2 revision of the protocol features the following:

  • multiplexing
  • priority
  • server push
  • header compression
  • streamlined SSL connections

Multiplexing.

Multiplexing and interleaved streams over a single HTTP connection for processing HTTP Requests in parallel is the most important new change and the one we understand the most because of Google’s SPDY project.

Web pages are generally composed from multiple HTTP objects, but up until now, HTTP 1.x has been limited to the serial processing of individual HTTP GET Requests issued by the web client for objects as they are discovered in the HTML markup and added to the Document Object Model for rendering. This serial rendering process is depicted schematically in Figure 1.

HTTP 1.x schematic

Figure 1. The web client in HTTP/1.x issues GET Requests to a web server serially over a single TCP connection. A follow-up GET Request is delayed until the Response message from the previous Request is received. HTTP/1.x allows for multiple connections to the same domain in order to download content in parallel.

In the diagram in Figure 1, the Round Trip Time (RTT) is also indicated, the time for a message to be transmitted from one Host to the other and for a TCP packet acknowledging receipt of that request to be received back at the Sender. The network Round Trip Time also reflects the minimum amount of time that a client needs to wait for an HTTP Response message from the web server in response to an HTTP GET Request. Notice this minimum response time is 2 * the network latency, independent of the bandwidth of the transmission medium. Network bandwidth only becomes a factor when the HTTP Request and Response messages are very large compared to the size of the segments TCP transmits, which are usually limited to 1460 (or fewer, depending on the size of the IP and TCP headers) bytes, due to restrictions in the Ethernet protocol that limit the size of the Maximum Transmission Unit (MTU). TCP messages that are larger than the MTU are broken into multiple packets by the IP layer before they are handed off to the network hardware, or Media Access (MAC) layer.

Network RTT is mainly a function of the physical distance separating the two machines, plus additional latency for each physical connection along the route, network hops where packets require some minimal amount of processing by IP routers in order to be forwarded to the next stop on the way to their ultimate destination. Web clients accessing web servers over the public Internet can expect to encounter RTTs in the range of 30-100 milliseconds, which is mainly a function of the physical distance separating the web client from the web server. None of this, of course, changes under HTTP/2.

Note, however, Figure 1 drastically simplifies what is actually a much more complex process at the web client. Whenever it discovers more resources that need to be downloaded from the same domain, the web browser can launch additional TCP sessions to the same web server for the purpose of issuing GET Requests in parallel. The official guideline in HTTP/1.1 are for no more than six active concurrent connections at a time, but your mileage varies with different browsers and different versions and different hardware. The potential for parallel processing in web page composition arises whenever an HTML, CSS, or JavaScript file contains external references to additional resources, which include references to image files, videos, etc. When you request a particular HTML web page, the markup in the original Response message typically serves as an exoskeleton for the page, a scaffolding that contains slots for additional files, each of which can also reference additional links.

The ability to download content using multiple sessions is one form of parallel processing that is currently available to HTTP 1.x clients. Currently under HTTP/1.x, these Requests are issued serially over a connection, arriving one at a time at the web server where they are processed in the order in which they are received. At issue with that approach is that each concurrent session under HTTP 1.x does require the establishment of a separate TCP connection, something that is even more time-consuming under HTTPS. Multiple sessions can also be wasteful when the individual connections are only used to transfer a single HTTP object or are accessed sporadically.

AJAX techniques[1] used to manipulate the DOM that use of asynchronous HTML Requests to web services, are another approach to adding parallel processing to the web page composition process. AJAX is implemented at the web client by executing JavaScript code that makes the web service requests. The XMLHttpRequest method that AJAX techniques utilize is also carried forward unchanged in HTTP/2.

Of course, Figure 1 also greatly simplifies what the web infrastructure at a large web property looks like. The diagram depicts a single web server, which is how it appears to the web client. In actuality, there can be thousands of web servers configured in a single, co-located cluster that are each capable of responding to the Request. The HTTP/1.x protocol being both connectionless and sessionless means that each Request is an independent entity. The stateless character of the HTTP protocol is what makes it possible for any web server in the infrastructure to respond to any Request. This sessionless behavior is also the key factor that allows for applying parallel processing to web workloads on a massive scale. (See this link for more on this aspect of the HTTP protocol.) TCP, the underlying Transport layer, is connection-oriented, but HTTP/1.x is not.

However, there are many web applications that generate HTML Response messages dynamically based on session state, usually the identity of the customer and, often, the customer’s current location. HTTP allows the web application to store a cookie at the web client where data encapsulating the session state is encoded and made available to subsequent Requests. Cookie data is automatically appended to subsequent GET Requests issued for the same domain in one of the HTTP message header fields.

Microsoft’s ASP.NET technology provides an alternative mechanism for maintaining session state between HTTP Requests. ASP.NET supports an explicit Session object that the application can utilize to preserve the state of a connection between Requests at the web server, a facility that is both more flexible, more reliable and more secure than using cookies on the client. However, whenever routing is performed by the web infrastructure to assign an available web server to process an incoming Request, that routing needs to be session-oriented, so that the session state stored by the previous Request is available to the web server application responding to a subsequent Request. In ASP.NET, there is a configuration option that allows the Session objects to be stored in an instance of SQL Server where they can be accessed by all the front-end web servers in the cluster.

How HTTP/2 Multiplexing works.

In HTTP/2, the web browser is allowed to fire off multiple GET Requests to a web server, one after the other, without waiting for each individual Response message in reply. This multiplexing capability is illustrated in Figure 2.

HTTP 2 multiplexing

Figure 2. The web client in HTTP/2 issues multiple GET Requests to a web server in parallel over a single TCP connection. GET Requests for resources can be issued immediately upon discovery, with no requirement to wait until the Response message from the previous Request is received.

At its end, an HTTP/2 web server can return Response messages to the client in any sequence without regard to the order in which they were requested. This allows the web server, for example, to sort a queue of outstanding Response messages awaiting processing for an HTTP connection in order based on which can be satisfied the quickest, a scheduling technique that improves the overall average Response message response time.

Response messages can also be interleaved in HTTP/2 – interleaving the segments from multiple Response messages provides greater flexibility for the web server to utilize the TCP connection efficiently, with the goal of packing as many bytes into the TCP Send Window as possible.

With multiplexing under HTTP/2, the idea is to attain the same or even higher levels of concurrency across a single HTTP connection.

An example: loading a Facebook page in SPDY.

To illustrate the difference between HTTP/1.x and SPDY, the immediate predecessor of HTTP/2, let’s look at a typical processing sequence for a GET Request that begins a web browser session with https:\\facebook.com, which we will discover is an excellent example of a monolithic web site. Using Internet Explorer 11, which supports SPDY/3, I simply issued a request to access my Facebook page. Note that due to the time-sensitive manner in which Facebook generates web pages, this is an experiment that is difficult to replicate precisely because the pages that are built reflect the latest activity and postings of your Facebook Friends.

In both HTTP/1.x, SPDY and HTTP/2, initiating a browser session with Facebook requires a DNS lookup and the exchange of a sequence of HTTPS session connection handshaking packets. Once the secure HTTP session is established, using a persistent TCP connection on Port 443, the web client can send a GET Request to www.facebook.com. This initial GET Request attaches a rather large cookie that contains variables that identify the issuer of the Request and other relevant status that the Facebook web application uses to generate a customized HTTP Response message. Up until this point, the processing steps taken in HTTP/1, SPDY, and HTTP/2 are identical, except that HTTP/2 exchanges two fewer packets to establish a secure connection.

In a test I made using the SPDY/3 protocol with Internet Explorer, the initial Facebook HTTP Response message was huge, about 550 KB, the first packet of which was received 328 ms after the GET Request was issued. It required a full 2 seconds for the Facebook SPDY/3 server to transmit the entire 550 KB Response message. When I examined this initial HTTP Response message, I found it contained markup referencing a large number of external resources that the browser needed to fetch. These included scripts, styles sheets, image files, video, and some advertising content. To load the entire page required 216 separate GET Request:Response Message sequences, which involved transferring 7.24 MB of data over the wire. According to the Developer Tools in Internet Explorer that captured the network traffic the resulted from the GET Request to Facebook, I waited over 3.6 seconds until the DOM’s Load event fired, signaling that the page was available for user interaction. Meanwhile, JavaScript code continued to execute in the background for another 20 seconds to add various dynamic elements to the web page.

It is at this point, once the initial HTTP Response message from Facebook is received at the web client, that HTTP/1.x and HTTP/2 and SPDY begin to diverge because these new versions of the protocol support multiplexing. As soon as the web browser started receiving the initial Response Message and started to construct the DOM, it immediately encountered links referencing several style sheets:

<link type=”text/css” rel=”stylesheet” href=”https://fbstatic-a.akamaihd.net/rsrc.php/v2/yB/r/PQzGy_gthig.css” />

<link type=”text/css” rel=”stylesheet” href=”https://fbstatic-a.akamaihd.net/rsrc.php/v2/yJ/r/cuqNSNZ2dlI.css” />

<link type=”text/css” rel=”stylesheet” href=”https://fbstatic-a.akamaihd.net/rsrc.php/v2/yi/r/RH3rvDA7dSR.css” />

<link type=”text/css” rel=”stylesheet” href=”https://fbstatic-a.akamaihd.net/rsrc.php/v2/yf/r/QFcEQNF3244.css” />

<link type=”text/css” rel=”stylesheet” href=”https://fbstatic-a.akamaihd.net/rsrc.php/v2/yD/r/flQGK0biLk6.css” />

Notice that https://fbstatic-a.akamaihd.net is a different domain than https://facebook.com, so the web browser must again perform a DNS Lookup and go through the secure connection handshaking to access this web server before it can request the download of the style sheets indicated. Here is where SPDY/3 and HTTP/1.x part company. In SPDY, the web browser can issue multiple GET Requests to the fbstatic web server in rapid succession over a TCP single connection. In HTTP/1.x, the web browser must initiate separate TCP connections to begin downloading the style sheet files in parallel.

Overall, the Facebook web application made most of its subsequent 215 GET Requests to just two domains: the fbstatic domain indicated above where common style sheets, image files, and scripts were located, and an fbcdn-profile domain where content specific to my Facebook profile and set of Friends was stored. With SPDY and with HTTP/2, two domains equals just two TCP connections. In HTTP/1.x, Internet Explorer would attempt to establish as many as twelve secure TCP connections to the two primary Facebook domains.

Monolithic and federated web publishing models.

Because ¾ of all the Facebook GET Requests were directed to just two domains, the Facebook web application is characterized as monolithic. Monolithic web sites like Facebook benefit the most from HTTP/2. Many other web properties, particularly media outlets, have a more federated structure, with content often spread across as many as 20-30 domains, frequently involving 3rd party web servers that are also directed to generate content dynamically based on the identity and location of the customer.

An example of how a monolithic web site like Facebook can be structured is illustrated in Figure 3.

massively parallel web server domain

Figure 3. A web server infrastructure that supports massive parallelism use three layers of hardware: a hardware VLAN switching layer, a set of front-end proxy servers, and a set of back-end file servers that have access to shared, high-speed disk storage. In HTTP/2, web client access funnels through a single TCP connection, as illustrated.

 

The web server infrastructure shown in Figure 3 contains several layers of hardware: high speed network routing, a set of front-end proxy servers that route requests to back-end file servers, and a shared disk storage layer. HTTP GET Requests enter the data center through a VLAN network switching layer that uses session-oriented load balancing in HTTP/2 to direct Requests to one of the proxy servers in the second layer. GET Requests for static HTTP objects are then relayed and resolved by a layer of back-end file servers that cache frequently-referenced files in memory, but can also fetch less frequently referenced files from high speed, shared disk storage. The designated proxy server maintains the state of the HTTP/2 connection and consolidates the Response messages into a single set of interleaved streams that are transmitted back to the web client.

HTTP/1.x encouraged the use of the federated model because web application performance could often be enhanced by domain sharding, the practice of breaking a logical domain into multiple physical domains in order to take advantage of parallelism during Page Load. Under HTTP/1.x, it is common practice to distribute the content from a domain like Facebook’s fbstatic over 3-5 physical domains, allowing for as many as thirty concurrent TCP sessions. From a performance standpoint, a web application under HTTP/1.x that has been partitioned and distributed across multiple physical sites can attain a level of concurrency during web page composition that is easily comparable to an HTTP/2 consolidated web server that uses multiplexing.
[1] The canonical example of AJAX – the acronym is short for Asynchronous JavaScript and XML – is an autocompletion textbox control on a web page. When you begin typing into an autocompletion textbox control, it triggers execution of a snippet of Javascript code that makes an asynchronous XMLHttpRequest call to a web service to gather popular responses associated with the first few typed characters.

Another example: loading a YouTube page in SPDY.

Another example of a monolithic web page that benefits from HTTP/2 is YouTube, which, of course, is owned by Google. On a recent visit to the YouTube Home page using Internet Explorer from my desktop, a 4.4 MB landing page was generated, built from 99 individual HTTP objects. The YouTube Home page html is about 500 KB, which is mainly scaffolding that the references the remaining HTTP objects. The remaining HTTP objects are broken down, as follows, with the bulk of them – over 3 MB – all served from a single domain:

  • Three style sheets, totaling about 300 KB.
  • A huge hunk of JavaScript, about 900 KB, for video playback.
  • The common.js library, about 350 KB.
  • About 50 of the HTTP objects on the page were jpeg images that serve as link buttons to the videos advertised, all loaded from a single domain.
  • In addition, ten smaller graphic sprites, ranging in size from 1500 bytes to about 15 KB, were loaded from a second domain.
  • Another ten objects, all JavaScripts, were loaded from a third YouTube domain, plus five additional JavaScript framework files that were all loaded from this domain, https://apis.google.com.
  • Then, Google wraps about ten small ads, each about 500 bytes, from doubleclick, another Google web property, around the content.
  • Finally, there is a rich media (i.e., Flash) display ad, about 250 KB, served from another Google-owned domain.

For monolithic sites like Facebook and YouTube, the practice of domain sharding for performance reasons is no longer necessary under HTTP/2. To take advantage of HTTP/2’s multiplexing, you will want to undo any domain sharding that you have performed in the past and consolidate your content into fewer domains. This should make site administration more straightforward under HTTP/2, if not outright easier.

Priority.

To help the web server differentiate among Requests being transmitted in parallel across the network, the HTTP/2 protocol introduces Request prioritization.

It is not yet clear how developers will indicate Request priority in standard HTML markup in HTTP/2, nor how servers will implement Request priority and handle potential issues that arise with priority scheduling such as the possibility of starvation. At the moment, for example, Microsoft has been experimenting with a non-standard, prioritization lazyload keyword beginning in Internet Explorer 10, but I am not sure how many people are using it — the trend in IE is away from the proprietary Microsoft HTML extensions that annoyed web developers for years.

Server Push.

Server Push will allow the web server to send multiple response messages to a single GET Request, anticipating that the web client is going to request these Response message as soon as it uncovers references for them in a previous Response message.
Potentially, as soon as the web server sends the initial Response message to an HTTP Get Request for an html page, it might also start to send CSS and JavaScript files that are referenced in the html. The aim is that the web server can push content that would start to arrive at the web client before the web client is able to discover that it needs these files and has time to prepare Requests to fetch them.

The Server Push feature is designed to obviate the need to inline resources such as scripts and styles in HTML markup, and shrink the number of client GET Requests that are required. It remains to be seen, however, whether Server Push is a clear performance win when inlining resources is not involved. The efforts of the web server to push content in anticipation of future Requests can easily backfire when those resources are already resident in the browser cache or in the CDN.

The Server Push capability is associated with a new HTTP/2 frame called a PUSH_PROMISE used by the web server to notify the client that it intends to push content in a stream not yet Requested by the client. Upon receiving the PUSH_PROMISE notification the web client can choose to reject the stream, based on first checking the contents of the web browser cache to see if it already has access to a valid copy of the promised content. Depending on how aggressive the web server intends to be in pushing content to the client in advance, Server Push runs the risk that a PUSH-PROMISE notification from the server and a RST_STREAM message from the client will cross in the mail and unnecessary data streams will be transmitted.

Header Compression.

HTTP header compression provides a big benefit on upload Requests from the web client, but won’t provide much notable improvement in shrinking the size of Response messages.
Web client HTTP GET Requests today contain a number of clear text Header fields like the Host name and the user-agent field that identifies the browser name and version. The same Header data for the connection must be sent for every Request because HTTP/1.x was originally conceived as a connectionless protocol. These mandatory header fields in HTTP/1 are surprisingly bulky, often forcing GET Requests that also have associated cookie data to send to span multiple packets. In HTTP/2, the web server needs to retain these header fields and associate them with the state of the connection, which means the browser is only required on subsequent message to send Header field data that is changed from the previous Request.

One interesting side effect of this change is to make the HTTP protocol more connection-oriented. Header compression requires web servers to save the initial set of HTTP headers as part of the connection state that is maintained for the duration of the session. The new HTTP/2 capability that allows interleaving of Response messages also requires maintenance of a durable connection-oriented session at the web server. Still, one can argue persuasively that dynamic HTML is session-oriented, too, in which case this is not really a burdensome new requirement, but a simple nod to the current reality.

Improved performance with Transport Layer Security.

Unlike Google’s grand experiment with the SPDY protocol, which influenced much of the new HTTP standard, HTTP/2 doesn’t absolutely require HTTPS, but it will encourage its use. HTTP/2 continues to plug into TCP Port 80, but the protocol is enhanced so that TLS can be requested at connection time. This fix saves a couple of packets and a Round Trip during initial session handshaking.

And another new feature…

Another new & noteworthy feature of HTTP/2, but one that did make my Top 5 list, is that it supports binary data, in contrast to HTTP/1 which is exclusively text based. User-friendly HTML-based text is both a blessing and a curse. The blessing is that HTML text encoding means that any feature implemented in anyone’s web site is discoverable, which has led to wide dissemination of industry Best Practices and patterns. But for those concerned about web security, sending HTTP messages in clear text just makes it that much easier for people to hack into today’s web sites and do harm. In theory, support for binary data should make it possible for people to build more secure HTTP/2-based web sites. In practice, WireShark already supports binary viewers for HTTP/2 streams, and Google’s Chrome provided a binary data plug-in for SPDY that can be readily adapted for HTTP/2. At this point, it is difficult to assess how much more secure the web will become using binary data. My guess is not much, given the combination of vulnerabilities that exist and the incentives that hackers continue to have.

Developers of web server and web client software figure they have a leg up on implementing HTTP/2 based on their prior experience getting SPDY to work. Google Chrome began supporting SPDY in 2011 at the same time that Google began adding SPDY support to its web properties, including YouTube, a network bandwidth hog. Facebook, Twitter and Akamai are other early adopters of SPDY on the web server side. With the adoption of the HTTP/2 standard, Google has already announced that it plans to drop SPDY in favor of the new industry standard in the next version of Chrome. In addition, Google is supplying HTTP/2 support in Apache. Not to be outdone, Microsoft added SPDY3 support to Internet Explorer 10 and has announced that both the IIS and IE 11 previews available with Windows 10 support HTTP/2.

In the final post in this series, I will look at the interaction between HTTP/2 multiplexing and the underlying TCP protocol..

HTTP/2: a change is gonna come, Part 1.

The Internet Engineering Task Force (IETF), the standards body responsible for Internet technology, recently accepted the HTTP/2 draft specification, the final official hurdle to be cleared prior to widespread industry adoption. HTTP is the protocol used between web clients and web browsers to exchange messages, and HTTP/2 is the first major revision of the HTTP protocol adopted since 1999 when HTTP/1.1 was finalized. (The principal HTTP 1.1 protocol changes provided for web page composition based on connection-oriented, dynamic HTML, which was still evolving at the time.)

The changes to the protocol for HTTP/2 are also directed squarely at the web page composition process. They are designed to speed up page load times, mainly through the use of (1) multiplexing where the web client can make multiple requests in parallel over a single TCP connection, and (2) server push where the web server can send content to the web client that it expects the web client will need in the near future, based on the current GET Request. This is a major change in the web’s application processing model, requiring adjustments at both the web server and the web client to support multiplexing and server push. In addition, many web sites, currently built to take advantage of the capabilities in HTTP/1.x, may require re-architecting to take better advantage of HTTP/2. Performance tools for web application developers will also need to play catch up to provide visibility into how multiplexing and server push are operating in order to assist with these re-architecture projects.

In a blog post explaining what web developers can expect from HTTP/2, Mark Nottingham, chairperson of the IETF HTTP Working Group, cautions, “HTTP/2 isn’t magic Web performance pixie dust; you can’t drop it in and expect your page load times to decrease by 50%.” Nottingham goes on to say, “It’s more accurate to view the new protocol as removing some key impediments to performance; once browsers and servers learn how and when to take advantage of that, performance should start incrementally improving.” Even though HTTP/2 is brand new, its multiplexing capabilities are largely based on a grand scale Google experiment known as SPDY that pioneered that feature. In this article, I will try to describe what the HTTP/2 changes will and won’t accomplish, based on what we know today about SPDY performance. In addition, I will make some specific recommendations to help you get ready and take advantage of the new capabilities in HTTP/2.

While HTTP/2 shapes up to be an important change to the technology that powers the Internet, the protocol revision does not address other serious performance concerns. For instance, it is not clear how other networking applications that rely on web services – think of all the apps that run on your phone that are network enabled – will be able to benefit from either multiplexing or server push. For browser-based web apps, HTTP/2 does not change the requirement for the browser to serialize the loading and execution of JavaScript files. Finally, while the HTTP/2 changes recognize that network latency is the fundamental source of most web performance problems, there is very little that can be done at the application protocol layer to overcome the physical reality of how fast electrical signals can be propagated through time, space, and wires.

One worrisome aspect of the HTTP/2 changes is how uncomfortably they fit atop the congestion control mechanisms that are implemented in TCP, the Internet’s Host-to-Host transport layer. These congestion control mechanisms were added to TCP about twenty years ago in the early days of the Internet to deal with severe network congestion that made the Internet virtually unusable under load. Current TCP congestion policies like slow start, additive increase/multiplicative decrease, and the initial size of cwin, the congestion window, cause problems for HTTP/2-oriented web applications that want to open a fat pipe[1] to the web client and push as much data through it as quickly as possible. TCP congestion control mechanisms are an important aspect of the transport-level protocol that manages the flow of messages through the underlying networking hardware, which is shared among consumers. For best results, web sites designed for HTTP/2 may find that some of these congestion controls policies need adjusting to take better advantage of new features in the web application protocol.

At odds with its use with video streaming and other bulk file copy operations, TCP was simply never designed to be optimal for throughput-oriented networked applications that are connected over long distances. Instead, TCP, by requiring a positive Acknowledgement from the Receiver for every packet of data sent, is able to provide a reliable message delivery service atop the IP protocol, which deliberately does not guarantee delivery of the messages it is handed to deliver. The designers behind the HTTP/2 changes are hardly the first set of people to struggle with this. In fact, the current TCP congestion control policies are designed to prevent certain types of data-hungry web applications from dominating and potentially overloading the shared networking infrastructure that all networking applications share. Call me an elite snob if you want, but I don’t relish a set of Internet architecture changes of which HTTP/2 may only be the first wave that optimizes the web for playback of high-definition cat videos on mobile phones at the expense of other networking applications, but that does appear to be the trend today in network communications technology.

Another interesting aspect of the HTTP/2 changes is the extent to which Real User Measurements (RUM) of web Page Load Time were used to validate and justify the design decisions that have been made, another example of just how influential and resilient the YSlow scalability model has proved. This is in spite of the many limitations of the RUM measurements, raising serious questions about how applicable they are to web applications that make extensive use of JavaScript manipulation of the DOM and add interactive capabilities using AJAX techniques to call web services asynchronously. In both sets pf circumstances, this DOM manipulation is performed in JavaScript code that executes after the page‘s Load event fires, which is when page load time is measured. RUM measurements that are gathered in the page’s Load event handler frequently do not capture this processing time. How valid the RUM measurements are in those environments is an open question among web performance experts.

Characterizing web application workloads

Much of the discussion that place in public in presentations, blogs and books on the subject of web application performance proceeds in blithe ignorance of the core measurement and modeling concepts used in the discipline of software performance engineering. One aspect that is strikingly absent from this discourse is a thorough consideration of the key characteristics of web application workloads that impact performance. Workload characterization is essential in any systematic approach to web application performance.

The performance characteristics of web applications span an enormous spectrum based on the size and complexity of the web pages that are generated. Some of those performance characteristics will make a big difference in whether or not the HTTP/2 protocol will help or hinder their performance. In particular, there appear to be three characteristics of web applications that will have the greatest impact on performance under HTTP/2:

  • the number of separate domains that GET Requests are directed to in order to construct the page
  • the number of HTTP objects (files, essentially) that need to be fetched from each domain,
  • and the distribution of the size of those objects

With regard to the number of domains that are accessed, looking at the top 500 sites, web applications range from one or two domains to having to pull together content from more than fifty. This behavior spans a range from monolithic web publishing where the content is consolidated on a very concise number of web servers to a more federated model where many more distinct web sites need to be accessed. Web sites that are consolidated in one or two domains perform better under HTTP/2 than those that rely on the federated model and were architected that way to perform well under HTTP/1.x. In addition, within each domain, the number of HTTP objects requested and their size is also pertinent to their performance under HTTP/2.

The number of HTTP objects that need to be fetched and their size are, of course, the two key components of the scalability model used in performance tools like YSlow that offer recommendations for building web pages that load faster. However, YSlow and similar tools currently ignore the sizable impact that multiple domains can have on web page load time. Overall, the HTTP/2 changes highlight the need to extend the deliberately simple model of web page load time that YSlow and its progeny have propagated.

SPDY

After extensive testing at Google and elsewhere, some clarity around SPDY performance has begun to emerge; we are starting to understand the characteristics of web applications that work well under SPDY and those that SPDY has little or no positive impact on. At a Tech Talk at Google back in 2011, the developers reported that implementing SPDY on Google’s web servers resulted in a 15% improvement in page load times across all of the company’s web properties. The SPDY developers did acknowledge that the experimental protocol did not help much to speed up Google Search, which was already highly optimized. On the other hand, SPDY did improve performance significantly at YouTube, a notoriously bandwidth-thirsty web application. Overall, Google’s testing showed SPDY required fewer TCP connections, fewer bytes transferred on uploads, and reduced the overall number of packets that needed to be transmitted by about 20%.

Google initially rolled out SPDY to great fanfare, publicizing the technology at its own events and at industry conferences like Velocity. At these events and on its web site, Google touted page load time improvements on the order of 50% or more in some cases, but did not fully explain what kinds of web site configuration changes were necessary to achieve those impressive results. Since then, there have also been several contrary reports, most notably from Guy Podjarney, a CTO at Akamai, who blogged back in 2012 that the touted improvements were “not as SPDY as you thought.” Podjarney reported, “SPDY, on average, is only about 4.5% faster than plain HTTPS, and is in fact about 3.4% slower than unencrypted HTTP” for a large number of real world sites that he tested. After extensive testing with SPDY, Podjarney observed that SPDY did improve page load times for web pages with either of the two of the following characteristics:

  • monolithic sites that consolidated content on a small number of domains
  • pages that did not block significantly during resolution of JavaScript files and .css style sheets

On a positive note, Podjarney’s testing did confirm that multiplexing the processing of Responses to GET Requests at the web server can boost performance when a complex web page is composed from many Requests that are mostly directed to the same domain, allowing HTTP/2 to reuse a single TCP connection for transmitting all the Requests and their associated Response messages.

As I will try to explain in further detail below, the HTTP/2 changes reflect the general trend toward building ever larger and more complex web pages and benefit the largest web properties where clustering huge numbers of similarly-configured web servers provides the ability to process a high volume of HTTP Requests in parallel. As for web pages growing more complex, the HTTP Archive, for example, shows the average web page increased in size from 700 KB in 2011 to 2 MB in 2015, with the average page currently composed of almost 100 HTTP objects. Internet access over broadband connections is fueling this trend, even with network latency acting as the principal constraint on web page load time.

A large web property (see Alexa for a list of top sites) maintains an enormous infrastructure for processing huge volumes of web traffic, literally capable of processing millions of HTTP GET Requests per second. The web site infrastructure may consist of tens of thousands (or more) individual web servers, augmented with many additional web servers distributed around the globe in either proprietary Edge networks or comparable facilities leased from Content Delivery Network (CDN) vendors such as Akamai. The ability to harness this enormous amount of parallel processing capability to respond to web Requests faster, however, remains limited by the latency of the network, which is physically constrained by signal propagation delays. A front-end resource of these infrastructures that is also constrained is the availability of TCP connections, which is limited by the width of the TCP Port number, which is 16 bits. That limitation in TCP cannot be readily changed, but the HTTP/2 modifications do address this constraint.

SPDY also included server push and prioritization, but far less is known about the impact of those specific new features today. The final draft of the HTTP/2 protocol specification is available at http://http2.github.io/http2-spec/.

In the next post, I will drill deeper into the major features in the HTTP/2 revision.

 .