Around IT in 256 seconds

#82: MongoDB: the most popular NoSQL database

August 16, 2022 | 3 Minute Read

MongoDB is a NoSQL database. Precisely speaking, it’s a document-oriented database. It stores arbitrarily complex key-value objects. For example, in a single Car object you can store as much information as you want. Not only license plate or manufacturing year. But also information about each individual part, history of repairs, insurance and all owners. No matter how much information you want to keep, you just put that in a single, easily accessible document. Contrast that to relational databases, where each relationship has to be modelled as a separate table. So the same Car would have been spread across tens of tables. Imagine all these SQL JOINs! No wonder why MongoDB is one of the most popular databases.

Speaking more broadly, in MongoDB you store arbitrary JSON-like object under some key. MongoDB doesn’t care what you put inside the database. Primitive fields, maps and arrays, in any combination. Up to 16 megabytes. Being able to quickly save any document is a blessing and a curse. MongoDB is great to quickly prototype and build applications. However, not enforcing any schema may become painful in the long run.

MongoDB offers horizontal scaling. It means your data is distributed between multiple nodes. This is quite useful when your dataset can no longer fit on a single machine. Data is sharded based on a user-defined field. For example, you can shard your cars records by owner name. In that case, cars of the same owner will land on the same node. You can use that technique to keep related documents close to each other. When data is ditributed, a special mongos router acts as a proxy. It decides which node or nodes to query.

Speaking of querying. In MongoDB, querying by primary key is very fast. However, you are free to look for documents by any, arbitrarily nested field. For example: find all cars where the front left tire was produced before 2020. Secondary indexes are possible to avoid full database scan.

Such a query is pretty simple, whereas a similar SQL would have been quite complex to write. However, what SQL is good for is aggregation and statistics. In MongoDB, aggregating data, like finding sums and averages, is not that straightforward. You can either use a simplified Map/Reduce job or so-called aggregation framework. All queries and aggregations must be written in JavaScript.

I mentioned that the lack of schema can be a curse. Why? Well, relational databases are quite strict when it comes to schema. Tables have a fixed set of columns with clearly defined types. In MongoDB enforcing your schema is up to you:

  • If you accidentally put a string rather than a number, database is fine with that.
  • If, after upgrading your codebase, you begin storing timestamps in a different format, MongoDB is more than happy
  • If you forget about some field that is required, MongoDB will not help you.
  • If you make a typo in a field name - you’re out of luck.

Moreover, documents are organized in collections - sort of like tables. There’s nothing preventing you from accidentally storing JSON representing, I don’t know, a Car in a person collection.

So MongoDB doesn’t enforce any schema validation. Which makes development really, really fast. But in reality, you still need to maintain and document some sort of schema. It’s just scattered around your application logic. And it’s fairly easy to put your data in incosistent state.

MongoDB has a track record of loosing data or becoming incosistent. Sometimes it was caused by bad configuration defaults not flushing data immediately to improve throughput. These days MongoDB has multi-document transactions and is much more solid. Check out Jepsen tests which examines various databases for consistency and correctness.

That’s it, thanks for listening, bye!

More materials

Tags: Jepsen, MongoDB, NoSQL, mongos

Be the first to listen to new episodes!

To get exclusive content: