#10: HTTP: the most abundant protocol in the internet
HTTP protocol is fundamental to the Internet. It’s a simple request-response protocol where the request is initiated by the client, typically a web browser
HTTP 1.0, 1.1, 2 and 3
However, HTTP is so pervasive that it’s also used to communicate between machines through APIs. Most important chatacteristics of HTTP are:
- both request and response contain a set of headers
- both request and response may contain a body
- body can be anything: HTML, JSON, binary image, MP3, etc.
- request contains the name of the resource we’d like to access
- accessing is described using a few verbs, like
GET
andPOST
- a response starts with code, most commonly used are
200 OK
and404 Not Found
But rather than explaining the basics, I’d like to discuss the differences between protocol versions. They are very important from performance perspective.
So the first major version of the protocol was 1.0.
Version 1.0 has one flaw: it opens a new TCP/IP connection for each request.
It’s a big deal: creating a new network connection requires a few round-trips.
Also, TCP/IP has a built-in mechanism called slow start.
Each new connection is initially rather slow but gets faster over time.
This mechanism adapts to varying network conditions dynamically.
Unfortunately, if you keep opening and closing new connections, they may never reach their full potential.
To overcome this issue a Keep-alive
request header was added as an afterthought.
It instructs the server to keep connection open even after finishing with the response.
Another request may come through the same connection.
This is called a persistent connection.
In HTTP 1.1 persistent connections are enabled by default and you have to disable them explicitly using Connection: close
header.
Another important addition was HTTP pipelining.
In HTTP 1.0, even with persistent connections, it wasn’t allowed to send second request before response to the first request was fully received.
Pipelining allows sending multiple requests at once and then simply waiting for responses.
This reduces network round-trips and overall waiting time.
However, keep in mind that responses must be delivered in the same order as requests.
So if your first response is really slow, server is not allowed to return subsequent faster responses.
This is called head-of-line-blocking.
HTTP/2 is a huge upgrade to HTTP protocol. It evolved from SPDY protocol. First of all HTTP is now a binary protocol. Secondly headers are compressed to avoid repetition. Thirdly, under single persistent connection there can be multiple logical streams, each representing a single interaction. All of the above are important, but I’ll focus on streams. Streams are independent of each other, even though they share the same connection. Therefor it’s easy to send multiple HTTP requests and receive responses from fastest to slowest. They can even interleave half-way. Head-of-line-blocking is somewhat reduced. Also, stream doesn’t have to be a request-response interaction. Server can even push data to the client without request. That sounds weird until you realize that a server can proactively push CSS and JavaScript to the browser even before it asks for it. Server push and greater parallelism make HTTP/2 a much faster protocol.
HTTP/3 is not yet released, but some browsers already support it. It is based on QUIC protocol and operates using UDP, rather than TCP/IP. It turned out that head-of-line-blocking emerged one more time on TCP/IP level, unaware of independent streams in HTTP/2. Moving on to UDP tries to solve this problem. Lost packets no longer cause the whole TCP/IP connection to recover and retry. Only a single stream is affected.