Redis is quite a versatile NoSQL, key-value database. Or in-memory cache. Or pub/sub broker. With transactions, stored procedures and fast replication. It’s quite universal. Anyway, the main use-case for Redis is caching. Internally the whole dataset must fit in memory. Redis can optionally persist data on disk, but all online operations happen entirely in memory. This makes Redis extremely fast. It’s often used as an alternative to the widespread Memcached server.
Persistence options are quite interesting and straightforward in Redis. First of all, you can entirely turn off persistence and risk data loss when Redis restarts. That’s sometimes OK, for example when used as cache. Secondly, persistence can be synchronous or asynchronous.
Asynchronous persistence relies on RDB files. They hold a full memory snapshot at point-in-time. These files are great for backup. But the fact they are asynchronous means they are often out-of-date.
Synchronous persistence, on the other hand, appends each and every write in an append-only file. AOF for short. Basically a commit log. Redis performs compaction underneath, but in essence every write lands there. Also they are used to reconstruct the database after a restart. This append-only file is also perfect for replication. Redis has one leader and optionally many followers. Only the leader accepts writes and appends them to AOF. Followers keep reading that file over the network, replaying all writes locally. They remember the position in the file, so after a restart they continue from where they left. This mechanism is similar to subscribers in Kafka. Or commit log replication in relational databases.
The replication process is asynchronous, so each follower may have a slightly outdated version of the history. This is called eventual consistency. When a follower is very outdated, the append-only file might have been compacted in the meantime. In such cases, it will receive the full snapshot and continue receiving deltas.
Turns out you can use AOF, RDB, none, or both at the same time. All depends on your use case.
So, replication helps with spreading reads onto multiple machines. But what if your dataset doesn’t fit in memory? The most naive approach is called client side partitioning. Each client can connect to multiple, independent Redis nodes. Also, each client manually chooses which node to use for reads and writes. Typically there is some hashing algorithm involved. This approach is quite error prone:
- every client needs to use the same hashing algorithm
- it’s impossible to add or remove nodes to scale the cluster dynamically
- every client needs to connect to every node separately.
Maybe what you need is Redis Cluster? Essentially, every key is associated with one of sixteen thousand hash slots. When there are many Redis nodes in the cluster, each node holds an equal share of hash slots. Adding or removing nodes is transparent, cluster simply moves some hashes to a different location. On top of all of that there is Redis Sentinel. Sentinel provides high-availability to Redis cluster.
Redis has way more features, yet remaining simple and lean. For example, there are transaction that can span multiple keys. Also you can write stored procedures in Lua language. Last but ot least, Redis records can have time-to-live (TTL). This works perfectly when Redis is used as a cache. In fact, Redis became so popular that you can use it natively in the cloud. For example, Amazon’s ElastiCache is built on top of Redis.
That’s it, thanks for listening, bye!