Around IT in 256 seconds

#70: CRDT: Conflict-free Replicated Data Type (guest: Martin Kleppmann)

April 12, 2022 | 4 Minute Read

Download audio

Author: Martin Kleppmann

Hello everyone! My name is Martin Kleppmann. I’m a researcher at the University of Cambridge. And I would like to tell you briefly about the technology called CRDTs. So, CRDT stands for Conflict-free Replicated Data Type. It’s a type of data structure that you can use to build collaboration software. So think software like Google Docs for example. Or Figma. Or Trello. Or a TODO list that syncs between your computer and your phone. You can build this type of software using CRDTs.

And it works like this. Say you are using a CRDT-based app and you are editing some sort of file. Let’s say it’s your slides for the presentation, for example. Now, whenever you update the file, CRDT records exactly what changes you made to the data. And then it packages up that information about what changed and it can send it over the network. So if you and your colleague are both working on the same file then any changes that you make to the file will be sent to your colleague over the network. And vice-versa, any change that your colleague makes, will be sent back to you. And like this, we can get real-time collaboration on the file.

Now, the nice thing about CRDTs is they actually still work even if your device is offline. So if you are not connected to the Internet right now and you make changes to the file, that’s totally fine. Those changes are still recorded and stored up. Sometime later when you’re online again your changes are synced with your colleague’s device. So that your colleague gets them too.

Now it could happen that while you’re offline you make some changes to the document. And your colleague also changes the file independently. If that happens, we need to merge together the changes your colleague made and the changes that you made. The nice thing about CRDTs is that it ensures we can always perform a clean merge of that kind of change. It’s a bit like what you get doing version control with git for example. But without having to resolve the merge conflicts manually because the CRDT provides the conflict resolution built-in.

Now Google Docs is being offering this kind of real-time collaboration for a long time. But Google Docs is actually based on a different type of algorithm. It’s called operational transformations. Or OT for short. OT and CRDT can do quite similar things. But there’s a big difference. The difference is that with OT any time you edit the document your edits have to be sent via a single server. And Google provides that server in the case of Google Docs. So all your communication, all your collaboration has to go via this one server.

And CRDTs are different because they are decentralized. They don’t require a single server to work. But instead, you can sync your devices via any kind of network that happens to be available. For example, a local WiFi if the devices are on the same WiFi. Or via a peer-to-peer network over the Internet.

So for this reason I believe CRDTs can allow us to build a whole new generation of collaboration software, which we call local-first software. We say that an app is local-first if it provides the same sort of real-time collaboration that we know and love from Google Docs. But also, if the software stores your files locally on your own computer. And that’s great because that way the app still works while it’s offline. And if you ever get kicked out of the cloud service or the cloud service shuts down for whatever reason you still have all the data on your computer. And nobody can take it away from you. And local-first software is made possible by CRDTs.

I work on an open-source CRDT called automerge. You can learn more about it, check it out on GitHub if you are interested. There’s also a lot of exciting academic research on CRDTs happening at the moment.

More materials

Tags: CRDT

Be the first to listen to new episodes!

To get exclusive content: