Presentations for the upcoming CMG conference are available on slideshare.net

I am presenting two topics at the Computer Measurement Group (CMG) annual conference this week, and I have just posted the slide decks I will be using on slideshare.net.

The latest slide deck for Monitoring Web Application Response Times in Windows is available here.

And the HTTP/2: Recent protocol changes and their impact on web application performance one is available here.

If you are a regular reader of this blog, most of this material will look familiar. Enjoy!

 .

HTTP/2 and the TCP protocol

This post continues a discussion of the HTTP/2 protocol changes that began here.

The latest set of changes to the HTTP protocol provide support for multiplexing across a single TCP connection, with the goal of pushing as many bytes into the TCP Send Window as early as possible. This goal of HTTP/2 comes into conflict with standard TCP congestion control policies that are very conservative about overloading a connection. These congestion control mechanisms include slow start, the small initial size of the congestion window (cwin), and ramping up the size of the cwin slowly using additive increase, multiplicative decrease. These standard TCP policies may all need to be modified for web sites that want to take advantage of running the HTTP/2 protocol. In this section we will look briefly at the standard TCP behavior that conflicts with HTTP/2 web server multiplexing.

The basic congestion control strategy is for TCP to begin conservatively, widen the session’s congestion window gradually until it reaches the full size of the session’s advertised Receive Window, and includes backing off the transmission rate sharply whenever the Receive Window is full. The TCP congestion control mechanism defines a congestion window (or cwin) that overlays the Send Window. By default, the cwin is initially a single TCP segment. Using slow start, the cwin increases incrementally until TCP detects a congestion signal, the most frequent congestion signal being a Send Window full condition that causes TCP to pause to wait for an Acknowledgement packet from the Receiver. Upon detecting a congestion signal, standard TCP also immediately cuts the size of the cwin in half.

To understand why these standard TCP congestion control mechanisms conflict with the HTTP/2 goal of maximizing throughput over a single TCP connection, let’s return to the Facebook example GET Request that was described earlier. If you remember, the initial Response message that Facebook generates dynamically based on the identity of the Requester is quite large, approximately 550 KB. This initial Response message frames the web page and contains numerous links to many additional static files – JavaScript files, style sheets, various images, plus advertising content. Let’s look at how TCP on the web server transmits the initial Response message first, and then we will look into the page composition process that gathers all the additional content referenced in the original Response message.

As part of establishing the TCP connection, the web server and the web client negotiate an AdvertisedWindow, the name of the sixteen-bit field in the TCP header that is used to advertise a Receive Window size. The 16-bit AdvertisedWindow field can specify a maximum Receive Window of 64 KB. An additional scaling factor can be added to the TCP Options to increase the size of the sliding Receive Window up to I GB. (Note: Windows uses a dynamic approach to called Receive Window auto-tuning to optimize the size of the Receive Window based on measurement feedback and congestion signals.) To improve the page load time, the web server might attempt to negotiate a very large Send Window, but the web client is likely to reject a Send Window larger than 64 KB for the connection, which is the Windows default. So, let’s assume a negotiated value of 64 KB for the TCP AdvertisedWindow.

As part of the standard slow start mechanism in force at the beginning of the session, from the 550 KB Response message TCP initially sends one 1460 byte segment and then awaits the ACK from the client. The slow start mechanism then increments the congestion window by one segment, so TCP next sends two packets to the client and pauses to wait for the ACK. Then three packets, then four packets, etc.

Consider a connection with an RTT of 100 ms. TCP can send 1/0.1, or ten cwin-sized transmissions per second. If the size of the cwin increases by one for each cwin, then during the 1st second of the Response message transmission, TCP can only send 55 packets, or only about the first 80 KB of the full 550 KB message.

Windows provides several TCP options that can significantly improve the throughput of the initial Response message transmission, and a similar set of tuning options is available for Linux Apache web servers. These options are useful for improving throughput for any web site that frequently serves up large HTTP objects, but they are especially important in HTTP/2 because the protocol changes have the effect of boosting the potential throughput of each connection. The first is an option to increase the size of the initial cwin: InitialCongestionWindowMss. The second option is to change the IIS web server’s default CongestionProvider to the Compound TCP policy because Compound TCP tries to negotiate a larger Send Window and uses bigger increments to ramp up the cwin more aggressively.

For example, the following Windows Powershell command:

Set-NetTCPSetting –SettingName Custom –CongestionProvider CTCP –InitialCongestionWindowMss 16

sets the initial size of the cwin to 16 segments (about 23 KB) and switches to the Compound TCP congestion policy, which increases the size of the cwin faster than the normal policy. These are more aggressive settings than the very conservative TCP congestion control defaults, but are consistent with customers accessing your web site over mainly high-bandwidth broadband connections.

Another TCP option setting is especially important in HTTP/2 where the web client and web server communicate over a single TCP connection. By default, TCP uses Additive Increase/Multiplicative Decrease to reduce the size of the CongestionWindow and slow down the rate of data transmission when a congestion signal is detected. The most common congestion signal is a Send Window full condition when the Sender has filled the AdvertisedWindow with unacknowledged data and is forced to pause and wait for an ACK from the Receiver before it can send more data. As the name implies, Additive Increase/ Multiplicative Decrease cuts the size of the current CongestionWindow in half when a congestion signal is detected.

Returning to the monolithic Facebook web application for a moment, you may recall that the initial HTTP Response message referenced as many as 200 external resources, most of which are files consolidated on just two domains. With HTTP/2 multiplexing, the web browser establishes one session for each of those two domains and starts firing off GET Requests to them without pausing to wait for Response messages. On the web server side at a facility like Facebook, these GET Requests are processed in parallel on the massive web server back-end, like the one illustrated in Figure 3. A congestion signal that shrinks the size of the cwin on the single TCP connection that HTTP/2 clients and servers use to communicate has a major impact on potential throughput for that connection.

This is one of the places where a federated HHTP/1.x site often outperforms HTTP/2. To understand why, consider what happens when a congestion signal for one of the many parallel TCP connections under HTTP/1.x is detected. Instead of funneling the content requested through the front-end web proxy server that consolidates all the Response messages into a set of interleaved streams across a single connection, in HTTP/1.x it is a Best Practice to distribute content across multiple physical domains, which can be accessed using parallel sessions. Figure 4 illustrates the web client in HTTP/1.x opening up six parallel TCP connections in order to initiate six concurrent GET Requests to each of these sharded domains. Note that because HTTP/1.x is sessionless, the parallel connections can be handled independently by separate front-end proxy servers, which adds another element of parallelism to HTTP version 1 web server processing.

massively parallel web server domain in HTTP1.x

Figure 4. Under HTTP/1.x, which supports as many as six parallel sessions per domain, a congestion signal that reduces the size of the cwin in one connection has limited impact on the overall throughput of the active TCP sessions established between the web server and web client.

Figure 4 illustrates what happens when one of the 6 parallel TCP sessions detects a congestion signal that causes TCP to shrink the current size of the congestion window by 50%. Since the congestion signal only impacts one of the parallel TCP sessions, throughput to the web server, given the six concurrent sessions, is only reduced by 1/12 in the HTTP/1.x example illustrated in Figure 4.

The other congestion signal that causes TCP to shrink the size of the cwin is an unacknowledged packet that needs to be retransmitted. On the likely assumption that the unACKed packet is lost due to congestion encountered somewhere along the route to its destination, when TCP retransmits a packet, it not only reduces the cwin by 50%, it also reverts to slow start. Again, comparing HTTP/2 network traffic that relies on a single connection to HTTP/1.x that has multiple parallel connections, this congestion control policy reduces the overall throughput on the multiplexed HTTP/2 connection disproportionately. In Windows, setting the CwndRestart TCP option to True directs TCP to widen the cwin normally following a retransmit, instead of reverting to slow start.

Setting the CwndRestart option is important in HTTP/2, as a recent study of SPDY performance conducted by researchers at Lancaster University in the UK, headed by Yehia Elkhatib, showed. In the study, Elkhatib and his team systematically investigated the effect of bandwidth, latency and packet loss on SPDY performance, using a range of web applications that varied in size and complexity. On low latency links with significant packet loss – the type of connections many cell phone users get – Elkhatib’s research found that the HTTP/1.x protocol outperformed SPDY. The more significant issue for cell phone web access is that the type of complex web pages served up from monolithic web sites that HTTP/2 is optimized do not display well on portable devices with small screens. These portable devices are better served by one of the following strategies: redirecting the GET Requests to a mobile version of the site, responsive web development that queries the screen size and customizes the content accordingly, or native cell phone apps that communicate using web services. As noted earlier, the HTTP/2 multiplexing changes have no impact on web services.

Summary

In summary, HTTP/2 is the first change to the protocol used by all web applications in over 15 years. The major new features that HTTP/2 supports include multiplexing and server push. Multiplexing frees the web client from the serial sequence of Request:Response messages that the original sessionless HTTP protocol required. Along with server push technology, under HTTP/2 web servers acquire greater flexibility to interleave Response messages and maximize throughput across a single TCP connection. Multiplexing and server push should also eliminate the need to inline resources or to aggregate many smaller HTTP objects into one consolidated file in order to reduce the number of GET Requests required to compose the page, each of which should make web site administration simpler under HTTP/2 since domain sharding is no longer necessary. Interestingly, the new header compression feature, as well as some other aspects of the HTTP/2 changes, introduces additional session-oriented behavior into the protocol web applications utilize.

Based on performance testing with Google’s experimental SPDY protocol, which introduced similar web application protocol changes impacting both the browser and the web server, it appears that HTTP/2 will benefit monolithic web sites that generate large, complex pages. HTTP/2 multiplexing will have less of an impact on HTTP/1.x sites that are already configured to take advantage of the web client’s capability to establish up to six concurrent sessions with a single domain and download content in parallel. HTTP/1.x web sites that are configured today to maximize parallelism by spreading web page content across many domains, using either a federated model or domain sharding, may need to be re-configured to take better advantage of HTTP/2.

Finally, the standard TCP congestion control policies that are initially quite conservative about overloading a TCP connection come into conflict with the HTTP/2 goal of maximizing throughput across a single TCP connection. Web site administrators should consider setting TCP options that increase the initial size of the TCP congestion window and enable the Compound TCP congestion policy in Windows that increases the size of the cwin more rapidly. Another setting that allows TCP to more aggressive in the wake of a lost packet congestion signal should also be considered. These specific TCP performance options tend to improve throughput over high bandwidth, long latency connections, so they can also apply to HTTP/1.x sites that need to serve up large HTTP objects.

 .

HTTP/2 multiplexing.

This is a continuation of an article on the recent HTTP/2 protocol changes adopted that starts here.

At this point I want to drill deeper into the major features in the HTTP/2 revision and then try to evaluate their tangible impact on web application performance. The HTTP/2 revision of the protocol features the following:

  • multiplexing
  • priority
  • server push
  • header compression
  • streamlined SSL connections

Multiplexing.

Multiplexing and interleaved streams over a single HTTP connection for processing HTTP Requests in parallel is the most important new change and the one we understand the most because of Google’s SPDY project.

Web pages are generally composed from multiple HTTP objects, but up until now, HTTP 1.x has been limited to the serial processing of individual HTTP GET Requests issued by the web client for objects as they are discovered in the HTML markup and added to the Document Object Model for rendering. This serial rendering process is depicted schematically in Figure 1.

HTTP 1.x schematic

Figure 1. The web client in HTTP/1.x issues GET Requests to a web server serially over a single TCP connection. A follow-up GET Request is delayed until the Response message from the previous Request is received. HTTP/1.x allows for multiple connections to the same domain in order to download content in parallel.

In the diagram in Figure 1, the Round Trip Time (RTT) is also indicated, the time for a message to be transmitted from one Host to the other and for a TCP packet acknowledging receipt of that request to be received back at the Sender. The network Round Trip Time also reflects the minimum amount of time that a client needs to wait for an HTTP Response message from the web server in response to an HTTP GET Request. Notice this minimum response time is 2 * the network latency, independent of the bandwidth of the transmission medium. Network bandwidth only becomes a factor when the HTTP Request and Response messages are very large compared to the size of the segments TCP transmits, which are usually limited to 1460 (or fewer, depending on the size of the IP and TCP headers) bytes, due to restrictions in the Ethernet protocol that limit the size of the Maximum Transmission Unit (MTU). TCP messages that are larger than the MTU are broken into multiple packets by the IP layer before they are handed off to the network hardware, or Media Access (MAC) layer.

Network RTT is mainly a function of the physical distance separating the two machines, plus additional latency for each physical connection along the route, network hops where packets require some minimal amount of processing by IP routers in order to be forwarded to the next stop on the way to their ultimate destination. Web clients accessing web servers over the public Internet can expect to encounter RTTs in the range of 30-100 milliseconds, which is mainly a function of the physical distance separating the web client from the web server. None of this, of course, changes under HTTP/2.

Note, however, Figure 1 drastically simplifies what is actually a much more complex process at the web client. Whenever it discovers more resources that need to be downloaded from the same domain, the web browser can launch additional TCP sessions to the same web server for the purpose of issuing GET Requests in parallel. The official guideline in HTTP/1.1 are for no more than six active concurrent connections at a time, but your mileage varies with different browsers and different versions and different hardware. The potential for parallel processing in web page composition arises whenever an HTML, CSS, or JavaScript file contains external references to additional resources, which include references to image files, videos, etc. When you request a particular HTML web page, the markup in the original Response message typically serves as an exoskeleton for the page, a scaffolding that contains slots for additional files, each of which can also reference additional links.

The ability to download content using multiple sessions is one form of parallel processing that is currently available to HTTP 1.x clients. Currently under HTTP/1.x, these Requests are issued serially over a connection, arriving one at a time at the web server where they are processed in the order in which they are received. At issue with that approach is that each concurrent session under HTTP 1.x does require the establishment of a separate TCP connection, something that is even more time-consuming under HTTPS. Multiple sessions can also be wasteful when the individual connections are only used to transfer a single HTTP object or are accessed sporadically.

AJAX techniques[1] used to manipulate the DOM that use of asynchronous HTML Requests to web services, are another approach to adding parallel processing to the web page composition process. AJAX is implemented at the web client by executing JavaScript code that makes the web service requests. The XMLHttpRequest method that AJAX techniques utilize is also carried forward unchanged in HTTP/2.

Of course, Figure 1 also greatly simplifies what the web infrastructure at a large web property looks like. The diagram depicts a single web server, which is how it appears to the web client. In actuality, there can be thousands of web servers configured in a single, co-located cluster that are each capable of responding to the Request. The HTTP/1.x protocol being both connectionless and sessionless means that each Request is an independent entity. The stateless character of the HTTP protocol is what makes it possible for any web server in the infrastructure to respond to any Request. This sessionless behavior is also the key factor that allows for applying parallel processing to web workloads on a massive scale. (See this link for more on this aspect of the HTTP protocol.) TCP, the underlying Transport layer, is connection-oriented, but HTTP/1.x is not.

However, there are many web applications that generate HTML Response messages dynamically based on session state, usually the identity of the customer and, often, the customer’s current location. HTTP allows the web application to store a cookie at the web client where data encapsulating the session state is encoded and made available to subsequent Requests. Cookie data is automatically appended to subsequent GET Requests issued for the same domain in one of the HTTP message header fields.

Microsoft’s ASP.NET technology provides an alternative mechanism for maintaining session state between HTTP Requests. ASP.NET supports an explicit Session object that the application can utilize to preserve the state of a connection between Requests at the web server, a facility that is both more flexible, more reliable and more secure than using cookies on the client. However, whenever routing is performed by the web infrastructure to assign an available web server to process an incoming Request, that routing needs to be session-oriented, so that the session state stored by the previous Request is available to the web server application responding to a subsequent Request. In ASP.NET, there is a configuration option that allows the Session objects to be stored in an instance of SQL Server where they can be accessed by all the front-end web servers in the cluster.

How HTTP/2 Multiplexing works.

In HTTP/2, the web browser is allowed to fire off multiple GET Requests to a web server, one after the other, without waiting for each individual Response message in reply. This multiplexing capability is illustrated in Figure 2.

HTTP 2 multiplexing

Figure 2. The web client in HTTP/2 issues multiple GET Requests to a web server in parallel over a single TCP connection. GET Requests for resources can be issued immediately upon discovery, with no requirement to wait until the Response message from the previous Request is received.

At its end, an HTTP/2 web server can return Response messages to the client in any sequence without regard to the order in which they were requested. This allows the web server, for example, to sort a queue of outstanding Response messages awaiting processing for an HTTP connection in order based on which can be satisfied the quickest, a scheduling technique that improves the overall average Response message response time.

Response messages can also be interleaved in HTTP/2 – interleaving the segments from multiple Response messages provides greater flexibility for the web server to utilize the TCP connection efficiently, with the goal of packing as many bytes into the TCP Send Window as possible.

With multiplexing under HTTP/2, the idea is to attain the same or even higher levels of concurrency across a single HTTP connection.

An example: loading a Facebook page in SPDY.

To illustrate the difference between HTTP/1.x and SPDY, the immediate predecessor of HTTP/2, let’s look at a typical processing sequence for a GET Request that begins a web browser session with https:\\facebook.com, which we will discover is an excellent example of a monolithic web site. Using Internet Explorer 11, which supports SPDY/3, I simply issued a request to access my Facebook page. Note that due to the time-sensitive manner in which Facebook generates web pages, this is an experiment that is difficult to replicate precisely because the pages that are built reflect the latest activity and postings of your Facebook Friends.

In both HTTP/1.x, SPDY and HTTP/2, initiating a browser session with Facebook requires a DNS lookup and the exchange of a sequence of HTTPS session connection handshaking packets. Once the secure HTTP session is established, using a persistent TCP connection on Port 443, the web client can send a GET Request to www.facebook.com. This initial GET Request attaches a rather large cookie that contains variables that identify the issuer of the Request and other relevant status that the Facebook web application uses to generate a customized HTTP Response message. Up until this point, the processing steps taken in HTTP/1, SPDY, and HTTP/2 are identical, except that HTTP/2 exchanges two fewer packets to establish a secure connection.

In a test I made using the SPDY/3 protocol with Internet Explorer, the initial Facebook HTTP Response message was huge, about 550 KB, the first packet of which was received 328 ms after the GET Request was issued. It required a full 2 seconds for the Facebook SPDY/3 server to transmit the entire 550 KB Response message. When I examined this initial HTTP Response message, I found it contained markup referencing a large number of external resources that the browser needed to fetch. These included scripts, styles sheets, image files, video, and some advertising content. To load the entire page required 216 separate GET Request:Response Message sequences, which involved transferring 7.24 MB of data over the wire. According to the Developer Tools in Internet Explorer that captured the network traffic the resulted from the GET Request to Facebook, I waited over 3.6 seconds until the DOM’s Load event fired, signaling that the page was available for user interaction. Meanwhile, JavaScript code continued to execute in the background for another 20 seconds to add various dynamic elements to the web page.

It is at this point, once the initial HTTP Response message from Facebook is received at the web client, that HTTP/1.x and HTTP/2 and SPDY begin to diverge because these new versions of the protocol support multiplexing. As soon as the web browser started receiving the initial Response Message and started to construct the DOM, it immediately encountered links referencing several style sheets:

<link type=”text/css” rel=”stylesheet” href=”https://fbstatic-a.akamaihd.net/rsrc.php/v2/yB/r/PQzGy_gthig.css” />

<link type=”text/css” rel=”stylesheet” href=”https://fbstatic-a.akamaihd.net/rsrc.php/v2/yJ/r/cuqNSNZ2dlI.css” />

<link type=”text/css” rel=”stylesheet” href=”https://fbstatic-a.akamaihd.net/rsrc.php/v2/yi/r/RH3rvDA7dSR.css” />

<link type=”text/css” rel=”stylesheet” href=”https://fbstatic-a.akamaihd.net/rsrc.php/v2/yf/r/QFcEQNF3244.css” />

<link type=”text/css” rel=”stylesheet” href=”https://fbstatic-a.akamaihd.net/rsrc.php/v2/yD/r/flQGK0biLk6.css” />

Notice that https://fbstatic-a.akamaihd.net is a different domain than https://facebook.com, so the web browser must again perform a DNS Lookup and go through the secure connection handshaking to access this web server before it can request the download of the style sheets indicated. Here is where SPDY/3 and HTTP/1.x part company. In SPDY, the web browser can issue multiple GET Requests to the fbstatic web server in rapid succession over a TCP single connection. In HTTP/1.x, the web browser must initiate separate TCP connections to begin downloading the style sheet files in parallel.

Overall, the Facebook web application made most of its subsequent 215 GET Requests to just two domains: the fbstatic domain indicated above where common style sheets, image files, and scripts were located, and an fbcdn-profile domain where content specific to my Facebook profile and set of Friends was stored. With SPDY and with HTTP/2, two domains equals just two TCP connections. In HTTP/1.x, Internet Explorer would attempt to establish as many as twelve secure TCP connections to the two primary Facebook domains.

Monolithic and federated web publishing models.

Because ¾ of all the Facebook GET Requests were directed to just two domains, the Facebook web application is characterized as monolithic. Monolithic web sites like Facebook benefit the most from HTTP/2. Many other web properties, particularly media outlets, have a more federated structure, with content often spread across as many as 20-30 domains, frequently involving 3rd party web servers that are also directed to generate content dynamically based on the identity and location of the customer.

An example of how a monolithic web site like Facebook can be structured is illustrated in Figure 3.

massively parallel web server domain

Figure 3. A web server infrastructure that supports massive parallelism use three layers of hardware: a hardware VLAN switching layer, a set of front-end proxy servers, and a set of back-end file servers that have access to shared, high-speed disk storage. In HTTP/2, web client access funnels through a single TCP connection, as illustrated.

 

The web server infrastructure shown in Figure 3 contains several layers of hardware: high speed network routing, a set of front-end proxy servers that route requests to back-end file servers, and a shared disk storage layer. HTTP GET Requests enter the data center through a VLAN network switching layer that uses session-oriented load balancing in HTTP/2 to direct Requests to one of the proxy servers in the second layer. GET Requests for static HTTP objects are then relayed and resolved by a layer of back-end file servers that cache frequently-referenced files in memory, but can also fetch less frequently referenced files from high speed, shared disk storage. The designated proxy server maintains the state of the HTTP/2 connection and consolidates the Response messages into a single set of interleaved streams that are transmitted back to the web client.

HTTP/1.x encouraged the use of the federated model because web application performance could often be enhanced by domain sharding, the practice of breaking a logical domain into multiple physical domains in order to take advantage of parallelism during Page Load. Under HTTP/1.x, it is common practice to distribute the content from a domain like Facebook’s fbstatic over 3-5 physical domains, allowing for as many as thirty concurrent TCP sessions. From a performance standpoint, a web application under HTTP/1.x that has been partitioned and distributed across multiple physical sites can attain a level of concurrency during web page composition that is easily comparable to an HTTP/2 consolidated web server that uses multiplexing.
[1] The canonical example of AJAX – the acronym is short for Asynchronous JavaScript and XML – is an autocompletion textbox control on a web page. When you begin typing into an autocompletion textbox control, it triggers execution of a snippet of Javascript code that makes an asynchronous XMLHttpRequest call to a web service to gather popular responses associated with the first few typed characters.

Another example: loading a YouTube page in SPDY.

Another example of a monolithic web page that benefits from HTTP/2 is YouTube, which, of course, is owned by Google. On a recent visit to the YouTube Home page using Internet Explorer from my desktop, a 4.4 MB landing page was generated, built from 99 individual HTTP objects. The YouTube Home page html is about 500 KB, which is mainly scaffolding that the references the remaining HTTP objects. The remaining HTTP objects are broken down, as follows, with the bulk of them – over 3 MB – all served from a single domain:

  • Three style sheets, totaling about 300 KB.
  • A huge hunk of JavaScript, about 900 KB, for video playback.
  • The common.js library, about 350 KB.
  • About 50 of the HTTP objects on the page were jpeg images that serve as link buttons to the videos advertised, all loaded from a single domain.
  • In addition, ten smaller graphic sprites, ranging in size from 1500 bytes to about 15 KB, were loaded from a second domain.
  • Another ten objects, all JavaScripts, were loaded from a third YouTube domain, plus five additional JavaScript framework files that were all loaded from this domain, https://apis.google.com.
  • Then, Google wraps about ten small ads, each about 500 bytes, from doubleclick, another Google web property, around the content.
  • Finally, there is a rich media (i.e., Flash) display ad, about 250 KB, served from another Google-owned domain.

For monolithic sites like Facebook and YouTube, the practice of domain sharding for performance reasons is no longer necessary under HTTP/2. To take advantage of HTTP/2’s multiplexing, you will want to undo any domain sharding that you have performed in the past and consolidate your content into fewer domains. This should make site administration more straightforward under HTTP/2, if not outright easier.

Priority.

To help the web server differentiate among Requests being transmitted in parallel across the network, the HTTP/2 protocol introduces Request prioritization.

It is not yet clear how developers will indicate Request priority in standard HTML markup in HTTP/2, nor how servers will implement Request priority and handle potential issues that arise with priority scheduling such as the possibility of starvation. At the moment, for example, Microsoft has been experimenting with a non-standard, prioritization lazyload keyword beginning in Internet Explorer 10, but I am not sure how many people are using it — the trend in IE is away from the proprietary Microsoft HTML extensions that annoyed web developers for years.

Server Push.

Server Push will allow the web server to send multiple response messages to a single GET Request, anticipating that the web client is going to request these Response message as soon as it uncovers references for them in a previous Response message.
Potentially, as soon as the web server sends the initial Response message to an HTTP Get Request for an html page, it might also start to send CSS and JavaScript files that are referenced in the html. The aim is that the web server can push content that would start to arrive at the web client before the web client is able to discover that it needs these files and has time to prepare Requests to fetch them.

The Server Push feature is designed to obviate the need to inline resources such as scripts and styles in HTML markup, and shrink the number of client GET Requests that are required. It remains to be seen, however, whether Server Push is a clear performance win when inlining resources is not involved. The efforts of the web server to push content in anticipation of future Requests can easily backfire when those resources are already resident in the browser cache or in the CDN.

The Server Push capability is associated with a new HTTP/2 frame called a PUSH_PROMISE used by the web server to notify the client that it intends to push content in a stream not yet Requested by the client. Upon receiving the PUSH_PROMISE notification the web client can choose to reject the stream, based on first checking the contents of the web browser cache to see if it already has access to a valid copy of the promised content. Depending on how aggressive the web server intends to be in pushing content to the client in advance, Server Push runs the risk that a PUSH-PROMISE notification from the server and a RST_STREAM message from the client will cross in the mail and unnecessary data streams will be transmitted.

Header Compression.

HTTP header compression provides a big benefit on upload Requests from the web client, but won’t provide much notable improvement in shrinking the size of Response messages.
Web client HTTP GET Requests today contain a number of clear text Header fields like the Host name and the user-agent field that identifies the browser name and version. The same Header data for the connection must be sent for every Request because HTTP/1.x was originally conceived as a connectionless protocol. These mandatory header fields in HTTP/1 are surprisingly bulky, often forcing GET Requests that also have associated cookie data to send to span multiple packets. In HTTP/2, the web server needs to retain these header fields and associate them with the state of the connection, which means the browser is only required on subsequent message to send Header field data that is changed from the previous Request.

One interesting side effect of this change is to make the HTTP protocol more connection-oriented. Header compression requires web servers to save the initial set of HTTP headers as part of the connection state that is maintained for the duration of the session. The new HTTP/2 capability that allows interleaving of Response messages also requires maintenance of a durable connection-oriented session at the web server. Still, one can argue persuasively that dynamic HTML is session-oriented, too, in which case this is not really a burdensome new requirement, but a simple nod to the current reality.

Improved performance with Transport Layer Security.

Unlike Google’s grand experiment with the SPDY protocol, which influenced much of the new HTTP standard, HTTP/2 doesn’t absolutely require HTTPS, but it will encourage its use. HTTP/2 continues to plug into TCP Port 80, but the protocol is enhanced so that TLS can be requested at connection time. This fix saves a couple of packets and a Round Trip during initial session handshaking.

And another new feature…

Another new & noteworthy feature of HTTP/2, but one that did make my Top 5 list, is that it supports binary data, in contrast to HTTP/1 which is exclusively text based. User-friendly HTML-based text is both a blessing and a curse. The blessing is that HTML text encoding means that any feature implemented in anyone’s web site is discoverable, which has led to wide dissemination of industry Best Practices and patterns. But for those concerned about web security, sending HTTP messages in clear text just makes it that much easier for people to hack into today’s web sites and do harm. In theory, support for binary data should make it possible for people to build more secure HTTP/2-based web sites. In practice, WireShark already supports binary viewers for HTTP/2 streams, and Google’s Chrome provided a binary data plug-in for SPDY that can be readily adapted for HTTP/2. At this point, it is difficult to assess how much more secure the web will become using binary data. My guess is not much, given the combination of vulnerabilities that exist and the incentives that hackers continue to have.

Developers of web server and web client software figure they have a leg up on implementing HTTP/2 based on their prior experience getting SPDY to work. Google Chrome began supporting SPDY in 2011 at the same time that Google began adding SPDY support to its web properties, including YouTube, a network bandwidth hog. Facebook, Twitter and Akamai are other early adopters of SPDY on the web server side. With the adoption of the HTTP/2 standard, Google has already announced that it plans to drop SPDY in favor of the new industry standard in the next version of Chrome. In addition, Google is supplying HTTP/2 support in Apache. Not to be outdone, Microsoft added SPDY3 support to Internet Explorer 10 and has announced that both the IIS and IE 11 previews available with Windows 10 support HTTP/2.

In the final post in this series, I will look at the interaction between HTTP/2 multiplexing and the underlying TCP protocol..