#53: CDN: Content Delivery Network: global scale caching
CDN is a set of geographically distributed servers for fast content delivery. Without CDN all requests are routed to your own server, located somewhere in the world. For example, in San Francisco. If your visitor lives in Australia, the experience is rather poor. But now imagine the traffic to your website is proxied through a global caching layer. Your visitor in Australia downloads data from an edge server nearby. A different visitor in Cape Town, Africa, will be routed to a completely different CDN server. The routing is done by the CDN itself, typically via DNS. It’s transparent to your visitors. Of course, all CDN servers contain the same data. Moreover, pretty much no-one contacts your own server in San Francisco. Only the CDN network itself. Technically, visitors don’t even know the address of your origin server! They use domain name like example.com
and DNS routes to appropriate cache server.
CDN has several benefits. First of all, your server is under very little load. The CDN network handles all of the requests. Secondly, if someone tries to perform a DDoS attack, they will attack the humoungous CDN infrastructure, not you. CDN networks are built to handle insane traffic. Including DDoS. Last, but not least, even if your website is temporarily down, the CDN can still serve traffic. CDN can even add some extra features, like TLS and spam protection.
We keep talking about websites and static assets. It’s true, CDNs work great for HTML, images, but also videos. You can even use CDN for video streaming. Netflix, one of the biggest consumers of global network traffic, went one step further. They deploy their private CDN directly in Internet Service Providers’ data centers. I mean, the physical caching hardware. Chances are that the next season of The Witcher will be streamed to your device without even touching the global internet. The traffic will be served directly from your Internet Service Provider. Your whole neighbourhood will use that cached content. This reduces traffic to Netflix, but also improves your watching experience.
Content Delivery Network has some drawbacks. First of all, it costs. Either you use commercial providers like Cloudflare, Akamai or Fastly. Or you built your own infrastructure. The latter option pays of only when your scale is quite insane. Apart from costs, CDN is actually another point of failure. All aforementioned providers (and much more) had global outages. When a CDN provider goes down, thousands upon thousands of websites and services go dark. Typically you won’t switch your production traffic directly to your server. If you are paranoid, you may have a backup CDN. But that increases the costs even further.
Another typical problem with CDN is cache staleness. CDN downloads your assets once and distributes them globally to hundreds of servers. All requests to these assets are then served from a CDN. That’s the point! But when your assets change, CDN must notice that and refresh its caches. Otherwise, your visitors will see the old version of images and scripts! It’s crucial to understand HTTP caching headers to give proper hints to CDN. Or invalidate CDN cache manually. Cache invalidation is hard!
That’s it, thanks for listening, bye!