Around IT in 256 seconds

#39: DNS: one of the fundamental protocols of the Internet

May 11, 2021 | 3 Minute Read

Domain name system (DNS for short) is one of the fundamental protocols of the Internet. In the Internet all communication is routed through IP addresses. Traditionally, these addresses consist of four numbers, like 91.198.174.192. Each and every server, as well as your computer, is identified using such an address. But we no longer remember phone numbers, let alone IP addresses! Remembering that the aforementioned IP belongs to wikipedia.org is tedious. DNS is often compared to a global phone book. A phone book that maps easy to remember domains like wikipedia.org or gmail.com to IP addresses - usable by machines. Without DNS the Internet could technically work. Just like you could use your phone without contacts, memorizing all phone numbers. DNS servers not only free us from remembering IP addresses. They know all of them.

One of the most common recruiting questions is: “What happens when you type google.com into your browser?” Before your browser can establish a TCP/IP connection and send an HTTP request, it must figure out, what’s the current IP of Google? Our computer asks the nearest DNS server? The nearest DNS server can either be our router, our ISP, or some well-known server. Like 1.1.1.1 by Cloudflare.

OK, obviously, one DNS server can’t serve the whole Internet. Otherwise, it would be a massive single point of failure. But it’s even worse. Simple replication over multiple nodes is not sufficient. The Internet has roughly half a billion domains. A single database would quickly run out of disk space. Therefore, DNS records are sharded in a hierarchy. At the very top there are 13 root servers. Their IP addresses are well-known and these servers are replicated. But the scope of root servers is very limited. They barely know where to find DNS servers for .com, .org, .gov and other top-level domains. So, the root server replies with the IP address of the DNS server knowing .com domains.

Now, we recursively call that second server. That one doesn’t know where google.com is located either. But it knows where the DNS for google.com (and all subdomains, like docs.google.com) is. The third server, owned by Google, finally knows where google.com is. This last server returns so-called authoritative answer. As opposed to recursive answers that we got on the way. You can debug all of this using dig command with +trace option.

OK, that’s a lot of work! Especially if you take into account that, for example, Wikipedia’s IP changed only three of times in the last decade. Google, on the other hand, used almost 6 hundred different IPs since 2011. But even for Google we can safely cache the domain to IP association. DNS records actually have a time to live attribute. All servers, as well as our computers and even web browsers, do some aggressive caching.

DNS protocol is almost 40 years old. It wasn’t built with security in mind. Imagine a man-in-the-middle attack returning a malicious server’s IP for a legitimate domain. This is called DNS spoofing. Also, the DNS traffic is unencrypted, so intermediate servers know which websites you enter. You can leverage this feature for good. For example, Pi-hole is a home DNS server proxy that blocks DNS queries to ads and trackers. I was blown away to see about 10% of my Internet traffic blocked with no negative impact on functionality.

That’s it, thanks for listening, bye!

More materials

Tags: dig, dns, dns-spoofing, pi-hole

Be the first to listen to new episodes!

To get exclusive content: