#96: Border Gateway Protocol: the duct tape that makes the Internet work
Border Gateway Protocol, BGP for short, is probably the most important protocols you might have never heard of. Well, you did at least once, in October 2021. When Facebook, WhatsApp, Instagram and Messenger all went down because of BGP misconfiguration. Or that one day back in 2008 when all YouTube traffic was accidentally routed to Pakistan. Because of BGP… misconfiguration. So what’s the big deal with BGP? First we must understand how the Internet works.
Internet is not a single network of computers. It’s a network of smaller networks. Your Internet Service Provider, like Comcast or Verizon is a network of computers and routers. Life would’ve been simple if all traffic was routed within one such network. But more often than not, Verizon must communicate with Comcast. Or even worse, it must route traffic through Comcast to eventually reach yet another network, like Facebook. Each such network is called an autonomous system, AS for short.
Again, life would have been simple if each and every AS would have been connected to each other. For example, let’s say there are three ASes, 1, 2 and 3. Each one has a direct connection to the other two. If you need to route traffic from AS 1 to AS 2, there’s a fiber cable for that. Also, each AS knows, which IP addresses belong to which AS. But now, the fourth AS appears. A new ISP, corporation, university, whatever. It’s no longer feasible to have a direct connection to all three existing ASes. So AS 4 connects to AS 1 and 2 only. There’s not direct connection from 4 to 3.
This is where it gets interesting. The AS 4 announces which IP chunks (known as prefixes) are handled by it. This announcement happens via BGP protocol (surprise!) AS 1 and AS 2 receive that message and keep it in their memory. They also forward this to AS 3. Now listen carefully. AS 3 receives two somewhat conflicting messages. One says that AS 4 is reachable through AS 1. The other says that AS 4 is reachable through AS 2. Both messages are correct and it’s up to AS 3 to decide which route is better if it ever needs to connect to AS 4.
The beauty of BGP is that it’s dynamic. If AS 4 looses the connection to AS 1, it can still communicate with it through AS 2. The network will heal itself, trying to fall back to the second best route. Every time a network topography changes, there is a wave of BGP prefix updates being broadcasted. It some sense, it’s a highly distributed and decentralized shortest path algorithm, running on global scale. In practice, we have tens of thousands of ASes worldwide. And each AS knows, as of today, more than 1 million prefixes. It means there is 1 million rules telling what’s the optimal path to a certain IP subset.
BGP can also be quite brittle. For example, if you accidentally announce that your IP prefixes are no longer reachable, your whole network is practically down. It’s still there, connected and operational. But the rest of the Internet doesn’t know how to get to you. It’s like you’ve erased your address from the map. This is what happened to Facebook in 2021. A similar thing happened when Pakistan tried to block YouTube in their country. But they accidentally advertised this rule globally, cutting of the entire world from videos.
Turkish government went even further, routing the entirety of Internet traffic to Turkey. Basically they claimed that every single IP on the planet is handled by one of their ISPs. Needless to say, it caused a massive disruptions.
That’s it, thanks for listening, bye!
More materials
- Border Gateway Protocol on Wikipedia
- BGP hijacking on Wikipedia
- How BGP Routing Really Works - very good explanation with diagrams
- What is BGP?
- Episode 468: Iljitsch van Beijnum on Internet Routing and BGP of SE Radio
- Outages