HTTP/2 multiplexing.

This is a continuation of an article on the recent HTTP/2 protocol changes adopted that starts here.

At this point I want to drill deeper into the major features in the HTTP/2 revision and then try to evaluate their tangible impact on web application performance. The HTTP/2 revision of the protocol features the following:

  • multiplexing
  • priority
  • server push
  • header compression
  • streamlined SSL connections


Multiplexing and interleaved streams over a single HTTP connection for processing HTTP Requests in parallel is the most important new change and the one we understand the most because of Google’s SPDY project.

Web pages are generally composed from multiple HTTP objects, but up until now, HTTP 1.x has been limited to the serial processing of individual HTTP GET Requests issued by the web client for objects as they are discovered in the HTML markup and added to the Document Object Model for rendering. This serial rendering process is depicted schematically in Figure 1.

HTTP 1.x schematic

Figure 1. The web client in HTTP/1.x issues GET Requests to a web server serially over a single TCP connection. A follow-up GET Request is delayed until the Response message from the previous Request is received. HTTP/1.x allows for multiple connections to the same domain in order to download content in parallel.

In the diagram in Figure 1, the Round Trip Time (RTT) is also indicated, the time for a message to be transmitted from one Host to the other and for a TCP packet acknowledging receipt of that request to be received back at the Sender. The network Round Trip Time also reflects the minimum amount of time that a client needs to wait for an HTTP Response message from the web server in response to an HTTP GET Request. Notice this minimum response time is 2 * the network latency, independent of the bandwidth of the transmission medium. Network bandwidth only becomes a factor when the HTTP Request and Response messages are very large compared to the size of the segments TCP transmits, which are usually limited to 1460 (or fewer, depending on the size of the IP and TCP headers) bytes, due to restrictions in the Ethernet protocol that limit the size of the Maximum Transmission Unit (MTU). TCP messages that are larger than the MTU are broken into multiple packets by the IP layer before they are handed off to the network hardware, or Media Access (MAC) layer.

Network RTT is mainly a function of the physical distance separating the two machines, plus additional latency for each physical connection along the route, network hops where packets require some minimal amount of processing by IP routers in order to be forwarded to the next stop on the way to their ultimate destination. Web clients accessing web servers over the public Internet can expect to encounter RTTs in the range of 30-100 milliseconds, which is mainly a function of the physical distance separating the web client from the web server. None of this, of course, changes under HTTP/2.

Note, however, Figure 1 drastically simplifies what is actually a much more complex process at the web client. Whenever it discovers more resources that need to be downloaded from the same domain, the web browser can launch additional TCP sessions to the same web server for the purpose of issuing GET Requests in parallel. The official guideline in HTTP/1.1 are for no more than six active concurrent connections at a time, but your mileage varies with different browsers and different versions and different hardware. The potential for parallel processing in web page composition arises whenever an HTML, CSS, or JavaScript file contains external references to additional resources, which include references to image files, videos, etc. When you request a particular HTML web page, the markup in the original Response message typically serves as an exoskeleton for the page, a scaffolding that contains slots for additional files, each of which can also reference additional links.

The ability to download content using multiple sessions is one form of parallel processing that is currently available to HTTP 1.x clients. Currently under HTTP/1.x, these Requests are issued serially over a connection, arriving one at a time at the web server where they are processed in the order in which they are received. At issue with that approach is that each concurrent session under HTTP 1.x does require the establishment of a separate TCP connection, something that is even more time-consuming under HTTPS. Multiple sessions can also be wasteful when the individual connections are only used to transfer a single HTTP object or are accessed sporadically.

AJAX techniques[1] used to manipulate the DOM that use of asynchronous HTML Requests to web services, are another approach to adding parallel processing to the web page composition process. AJAX is implemented at the web client by executing JavaScript code that makes the web service requests. The XMLHttpRequest method that AJAX techniques utilize is also carried forward unchanged in HTTP/2.

Of course, Figure 1 also greatly simplifies what the web infrastructure at a large web property looks like. The diagram depicts a single web server, which is how it appears to the web client. In actuality, there can be thousands of web servers configured in a single, co-located cluster that are each capable of responding to the Request. The HTTP/1.x protocol being both connectionless and sessionless means that each Request is an independent entity. The stateless character of the HTTP protocol is what makes it possible for any web server in the infrastructure to respond to any Request. This sessionless behavior is also the key factor that allows for applying parallel processing to web workloads on a massive scale. (See this link for more on this aspect of the HTTP protocol.) TCP, the underlying Transport layer, is connection-oriented, but HTTP/1.x is not.

However, there are many web applications that generate HTML Response messages dynamically based on session state, usually the identity of the customer and, often, the customer’s current location. HTTP allows the web application to store a cookie at the web client where data encapsulating the session state is encoded and made available to subsequent Requests. Cookie data is automatically appended to subsequent GET Requests issued for the same domain in one of the HTTP message header fields.

Microsoft’s ASP.NET technology provides an alternative mechanism for maintaining session state between HTTP Requests. ASP.NET supports an explicit Session object that the application can utilize to preserve the state of a connection between Requests at the web server, a facility that is both more flexible, more reliable and more secure than using cookies on the client. However, whenever routing is performed by the web infrastructure to assign an available web server to process an incoming Request, that routing needs to be session-oriented, so that the session state stored by the previous Request is available to the web server application responding to a subsequent Request. In ASP.NET, there is a configuration option that allows the Session objects to be stored in an instance of SQL Server where they can be accessed by all the front-end web servers in the cluster.

How HTTP/2 Multiplexing works.

In HTTP/2, the web browser is allowed to fire off multiple GET Requests to a web server, one after the other, without waiting for each individual Response message in reply. This multiplexing capability is illustrated in Figure 2.

HTTP 2 multiplexing

Figure 2. The web client in HTTP/2 issues multiple GET Requests to a web server in parallel over a single TCP connection. GET Requests for resources can be issued immediately upon discovery, with no requirement to wait until the Response message from the previous Request is received.

At its end, an HTTP/2 web server can return Response messages to the client in any sequence without regard to the order in which they were requested. This allows the web server, for example, to sort a queue of outstanding Response messages awaiting processing for an HTTP connection in order based on which can be satisfied the quickest, a scheduling technique that improves the overall average Response message response time.

Response messages can also be interleaved in HTTP/2 – interleaving the segments from multiple Response messages provides greater flexibility for the web server to utilize the TCP connection efficiently, with the goal of packing as many bytes into the TCP Send Window as possible.

With multiplexing under HTTP/2, the idea is to attain the same or even higher levels of concurrency across a single HTTP connection.

An example: loading a Facebook page in SPDY.

To illustrate the difference between HTTP/1.x and SPDY, the immediate predecessor of HTTP/2, let’s look at a typical processing sequence for a GET Request that begins a web browser session with https:\\, which we will discover is an excellent example of a monolithic web site. Using Internet Explorer 11, which supports SPDY/3, I simply issued a request to access my Facebook page. Note that due to the time-sensitive manner in which Facebook generates web pages, this is an experiment that is difficult to replicate precisely because the pages that are built reflect the latest activity and postings of your Facebook Friends.

In both HTTP/1.x, SPDY and HTTP/2, initiating a browser session with Facebook requires a DNS lookup and the exchange of a sequence of HTTPS session connection handshaking packets. Once the secure HTTP session is established, using a persistent TCP connection on Port 443, the web client can send a GET Request to This initial GET Request attaches a rather large cookie that contains variables that identify the issuer of the Request and other relevant status that the Facebook web application uses to generate a customized HTTP Response message. Up until this point, the processing steps taken in HTTP/1, SPDY, and HTTP/2 are identical, except that HTTP/2 exchanges two fewer packets to establish a secure connection.

In a test I made using the SPDY/3 protocol with Internet Explorer, the initial Facebook HTTP Response message was huge, about 550 KB, the first packet of which was received 328 ms after the GET Request was issued. It required a full 2 seconds for the Facebook SPDY/3 server to transmit the entire 550 KB Response message. When I examined this initial HTTP Response message, I found it contained markup referencing a large number of external resources that the browser needed to fetch. These included scripts, styles sheets, image files, video, and some advertising content. To load the entire page required 216 separate GET Request:Response Message sequences, which involved transferring 7.24 MB of data over the wire. According to the Developer Tools in Internet Explorer that captured the network traffic the resulted from the GET Request to Facebook, I waited over 3.6 seconds until the DOM’s Load event fired, signaling that the page was available for user interaction. Meanwhile, JavaScript code continued to execute in the background for another 20 seconds to add various dynamic elements to the web page.

It is at this point, once the initial HTTP Response message from Facebook is received at the web client, that HTTP/1.x and HTTP/2 and SPDY begin to diverge because these new versions of the protocol support multiplexing. As soon as the web browser started receiving the initial Response Message and started to construct the DOM, it immediately encountered links referencing several style sheets:

<link type=”text/css” rel=”stylesheet” href=”” />

<link type=”text/css” rel=”stylesheet” href=”” />

<link type=”text/css” rel=”stylesheet” href=”” />

<link type=”text/css” rel=”stylesheet” href=”” />

<link type=”text/css” rel=”stylesheet” href=”” />

Notice that is a different domain than, so the web browser must again perform a DNS Lookup and go through the secure connection handshaking to access this web server before it can request the download of the style sheets indicated. Here is where SPDY/3 and HTTP/1.x part company. In SPDY, the web browser can issue multiple GET Requests to the fbstatic web server in rapid succession over a TCP single connection. In HTTP/1.x, the web browser must initiate separate TCP connections to begin downloading the style sheet files in parallel.

Overall, the Facebook web application made most of its subsequent 215 GET Requests to just two domains: the fbstatic domain indicated above where common style sheets, image files, and scripts were located, and an fbcdn-profile domain where content specific to my Facebook profile and set of Friends was stored. With SPDY and with HTTP/2, two domains equals just two TCP connections. In HTTP/1.x, Internet Explorer would attempt to establish as many as twelve secure TCP connections to the two primary Facebook domains.

Monolithic and federated web publishing models.

Because ¾ of all the Facebook GET Requests were directed to just two domains, the Facebook web application is characterized as monolithic. Monolithic web sites like Facebook benefit the most from HTTP/2. Many other web properties, particularly media outlets, have a more federated structure, with content often spread across as many as 20-30 domains, frequently involving 3rd party web servers that are also directed to generate content dynamically based on the identity and location of the customer.

An example of how a monolithic web site like Facebook can be structured is illustrated in Figure 3.

massively parallel web server domain

Figure 3. A web server infrastructure that supports massive parallelism use three layers of hardware: a hardware VLAN switching layer, a set of front-end proxy servers, and a set of back-end file servers that have access to shared, high-speed disk storage. In HTTP/2, web client access funnels through a single TCP connection, as illustrated.


The web server infrastructure shown in Figure 3 contains several layers of hardware: high speed network routing, a set of front-end proxy servers that route requests to back-end file servers, and a shared disk storage layer. HTTP GET Requests enter the data center through a VLAN network switching layer that uses session-oriented load balancing in HTTP/2 to direct Requests to one of the proxy servers in the second layer. GET Requests for static HTTP objects are then relayed and resolved by a layer of back-end file servers that cache frequently-referenced files in memory, but can also fetch less frequently referenced files from high speed, shared disk storage. The designated proxy server maintains the state of the HTTP/2 connection and consolidates the Response messages into a single set of interleaved streams that are transmitted back to the web client.

HTTP/1.x encouraged the use of the federated model because web application performance could often be enhanced by domain sharding, the practice of breaking a logical domain into multiple physical domains in order to take advantage of parallelism during Page Load. Under HTTP/1.x, it is common practice to distribute the content from a domain like Facebook’s fbstatic over 3-5 physical domains, allowing for as many as thirty concurrent TCP sessions. From a performance standpoint, a web application under HTTP/1.x that has been partitioned and distributed across multiple physical sites can attain a level of concurrency during web page composition that is easily comparable to an HTTP/2 consolidated web server that uses multiplexing.
[1] The canonical example of AJAX – the acronym is short for Asynchronous JavaScript and XML – is an autocompletion textbox control on a web page. When you begin typing into an autocompletion textbox control, it triggers execution of a snippet of Javascript code that makes an asynchronous XMLHttpRequest call to a web service to gather popular responses associated with the first few typed characters.

Another example: loading a YouTube page in SPDY.

Another example of a monolithic web page that benefits from HTTP/2 is YouTube, which, of course, is owned by Google. On a recent visit to the YouTube Home page using Internet Explorer from my desktop, a 4.4 MB landing page was generated, built from 99 individual HTTP objects. The YouTube Home page html is about 500 KB, which is mainly scaffolding that the references the remaining HTTP objects. The remaining HTTP objects are broken down, as follows, with the bulk of them – over 3 MB – all served from a single domain:

  • Three style sheets, totaling about 300 KB.
  • A huge hunk of JavaScript, about 900 KB, for video playback.
  • The common.js library, about 350 KB.
  • About 50 of the HTTP objects on the page were jpeg images that serve as link buttons to the videos advertised, all loaded from a single domain.
  • In addition, ten smaller graphic sprites, ranging in size from 1500 bytes to about 15 KB, were loaded from a second domain.
  • Another ten objects, all JavaScripts, were loaded from a third YouTube domain, plus five additional JavaScript framework files that were all loaded from this domain,
  • Then, Google wraps about ten small ads, each about 500 bytes, from doubleclick, another Google web property, around the content.
  • Finally, there is a rich media (i.e., Flash) display ad, about 250 KB, served from another Google-owned domain.

For monolithic sites like Facebook and YouTube, the practice of domain sharding for performance reasons is no longer necessary under HTTP/2. To take advantage of HTTP/2’s multiplexing, you will want to undo any domain sharding that you have performed in the past and consolidate your content into fewer domains. This should make site administration more straightforward under HTTP/2, if not outright easier.


To help the web server differentiate among Requests being transmitted in parallel across the network, the HTTP/2 protocol introduces Request prioritization.

It is not yet clear how developers will indicate Request priority in standard HTML markup in HTTP/2, nor how servers will implement Request priority and handle potential issues that arise with priority scheduling such as the possibility of starvation. At the moment, for example, Microsoft has been experimenting with a non-standard, prioritization lazyload keyword beginning in Internet Explorer 10, but I am not sure how many people are using it — the trend in IE is away from the proprietary Microsoft HTML extensions that annoyed web developers for years.

Server Push.

Server Push will allow the web server to send multiple response messages to a single GET Request, anticipating that the web client is going to request these Response message as soon as it uncovers references for them in a previous Response message.
Potentially, as soon as the web server sends the initial Response message to an HTTP Get Request for an html page, it might also start to send CSS and JavaScript files that are referenced in the html. The aim is that the web server can push content that would start to arrive at the web client before the web client is able to discover that it needs these files and has time to prepare Requests to fetch them.

The Server Push feature is designed to obviate the need to inline resources such as scripts and styles in HTML markup, and shrink the number of client GET Requests that are required. It remains to be seen, however, whether Server Push is a clear performance win when inlining resources is not involved. The efforts of the web server to push content in anticipation of future Requests can easily backfire when those resources are already resident in the browser cache or in the CDN.

The Server Push capability is associated with a new HTTP/2 frame called a PUSH_PROMISE used by the web server to notify the client that it intends to push content in a stream not yet Requested by the client. Upon receiving the PUSH_PROMISE notification the web client can choose to reject the stream, based on first checking the contents of the web browser cache to see if it already has access to a valid copy of the promised content. Depending on how aggressive the web server intends to be in pushing content to the client in advance, Server Push runs the risk that a PUSH-PROMISE notification from the server and a RST_STREAM message from the client will cross in the mail and unnecessary data streams will be transmitted.

Header Compression.

HTTP header compression provides a big benefit on upload Requests from the web client, but won’t provide much notable improvement in shrinking the size of Response messages.
Web client HTTP GET Requests today contain a number of clear text Header fields like the Host name and the user-agent field that identifies the browser name and version. The same Header data for the connection must be sent for every Request because HTTP/1.x was originally conceived as a connectionless protocol. These mandatory header fields in HTTP/1 are surprisingly bulky, often forcing GET Requests that also have associated cookie data to send to span multiple packets. In HTTP/2, the web server needs to retain these header fields and associate them with the state of the connection, which means the browser is only required on subsequent message to send Header field data that is changed from the previous Request.

One interesting side effect of this change is to make the HTTP protocol more connection-oriented. Header compression requires web servers to save the initial set of HTTP headers as part of the connection state that is maintained for the duration of the session. The new HTTP/2 capability that allows interleaving of Response messages also requires maintenance of a durable connection-oriented session at the web server. Still, one can argue persuasively that dynamic HTML is session-oriented, too, in which case this is not really a burdensome new requirement, but a simple nod to the current reality.

Improved performance with Transport Layer Security.

Unlike Google’s grand experiment with the SPDY protocol, which influenced much of the new HTTP standard, HTTP/2 doesn’t absolutely require HTTPS, but it will encourage its use. HTTP/2 continues to plug into TCP Port 80, but the protocol is enhanced so that TLS can be requested at connection time. This fix saves a couple of packets and a Round Trip during initial session handshaking.

And another new feature…

Another new & noteworthy feature of HTTP/2, but one that did make my Top 5 list, is that it supports binary data, in contrast to HTTP/1 which is exclusively text based. User-friendly HTML-based text is both a blessing and a curse. The blessing is that HTML text encoding means that any feature implemented in anyone’s web site is discoverable, which has led to wide dissemination of industry Best Practices and patterns. But for those concerned about web security, sending HTTP messages in clear text just makes it that much easier for people to hack into today’s web sites and do harm. In theory, support for binary data should make it possible for people to build more secure HTTP/2-based web sites. In practice, WireShark already supports binary viewers for HTTP/2 streams, and Google’s Chrome provided a binary data plug-in for SPDY that can be readily adapted for HTTP/2. At this point, it is difficult to assess how much more secure the web will become using binary data. My guess is not much, given the combination of vulnerabilities that exist and the incentives that hackers continue to have.

Developers of web server and web client software figure they have a leg up on implementing HTTP/2 based on their prior experience getting SPDY to work. Google Chrome began supporting SPDY in 2011 at the same time that Google began adding SPDY support to its web properties, including YouTube, a network bandwidth hog. Facebook, Twitter and Akamai are other early adopters of SPDY on the web server side. With the adoption of the HTTP/2 standard, Google has already announced that it plans to drop SPDY in favor of the new industry standard in the next version of Chrome. In addition, Google is supplying HTTP/2 support in Apache. Not to be outdone, Microsoft added SPDY3 support to Internet Explorer 10 and has announced that both the IIS and IE 11 previews available with Windows 10 support HTTP/2.

In the final post in this series, I will look at the interaction between HTTP/2 multiplexing and the underlying TCP protocol..

Tagged , , . Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *