#40: Docker: more than a process, less than a VM
When two processes run on the same machine, they are somewhat isolated. For example, they cannot read each other’s memory. However, they still share the same file system, libraries, network ports. And hardware: CPU and memory. Docker allows running processes with greater isolation on a Linux machine. Processes like: web servers, databases or web applications. Traditionally, to achieve better isolation, virtual machines were used. Virtual machine is essentially an operating system started inside of another operating system. For example, Windows running inside Linux. Typically you run a few VMs on a single host. Unfortunately, a virtual machine has an overhead. It takes several seconds to start and uses a significant amount of memory. Docker is somewhere in between. Better isolation than plain processes, but it’s not quite yet a VM.
Many people claim that Docker is essentially a lightweight VM.
In reality, Docker runs an ordinary process, but wrapped in multiple Linux features.
For example, thanks to Linux namespaces, two processes can listen on the same network port.
Thanks to cgroups
, Docker can put a hard limit on the amount of memory and CPU each process can get.
Thanks to chroot
, each process has its own filesystem.
The last one is especially important.
Imagine you have two web applications.
Both need to listen on port 80.
And both require the same global library.
Unfortunately, in two different, incompatible versions.
Traditionally, this was really hard to solve.
And often led to weird deployment and runtime failures.
With Docker, each process has its own, independent copy of the filesystem.
Such a copy contains the exact set of libraries and tools it needs.
Not more, not less.
Only the Linux kernel is shared.
All of these capabilities are neatly packaged inside a so-called container.
A container is really just a file system containing the OS distribution and everything needed for a process to run.
For example, Java runtime, Python libraries, node packages.
You don’t have to worry about conflicts like multiple Java versions or bloated node_modules
.
Each application has its own copy of the entire file system.
To save space, file system is organized in layers.
The first layer is typically an operating system.
Which rarely changes.
We typically run a few, maybe 10-ish virtual machines on a single host.
However, it’s not uncommon to run tens, if not hundreds of containers.
Just to give a brief overview:
MySQL inside Docker starts in about 1 second.
The same applies to nginx
or Django.
By containerizing them we make installation less error-prone.
We simply type docker run mysql
and do not worry about:
- installing external dependencies
- creating users and configuration files
- permissions
- network ports
It’s also slightly safer because MySQL is isolated from other processes running on the same machine. However, keep in mind that Docker is not bullet-proof. Especially compared to VMs, which aren’t bullet-proof as well.
Docker not only makes it easier to run any database or server locally. These days it’s a de-facto standard to deploy containers, rather than naked processes. So if your application runs locally inside Docker, it’s almost guaranteed to run in the cloud as well. Deployment via Docker is simply much more predictable.
Docker itself is quite low-level.
Typically your application consists of multiple containers that need to be orchestrated.
There are several tools from docker-compose
to Kubernetes that can help.
But that’s a topic for another episode.
In the meantime, thanks for listening, bye!
More materials
- Docker on Wikipedia
- Linux namespaces
cgroups
chroot
- OverlayFS
- “containers aren’t magic”
- “what’s a container?”
- Bocker - Docker implemented in around 100 lines of bash.