Struggling with understanding the benefits of NoSQL databases? Confused by CAP theorem? Everybody talks about MongoDB and you can not support the talk? Then "NoSQL distilled" is a must have book to read.

What it is all about

The IT solutions have to store the data somewhere. From 1970s and until now the most common option to store the data is a relational database. It can be a PostgreSQL, MySQL or Oracle. Of course, there are less famous but still viable alternatives. They implement the relational paradigm to store the data: each entity is placed in it's own table. Tables have relations, like a book have an author(1 to 1 relation), or it can have several authors(1 to many authors). Those databases provide you with strong ACID guarantees, which make them really reliable.

Relational databases were almost perfect from some perspective until the mid 2000s. The systems are working with bigger and bigger amount of data. But those databases are not able to scale easily. Vertical scaling has the limitation and comes with a price. Horizontal scaling is complex. This is the point where NoSQL comes in.  Those databases are not following relational algebra. They are built around aggregates instead.

Same data represented as a document(left) and as set of rows in tables(right)

The main reason people are adopting NoSQL databases in their solutions is the possiblity to scale horizontally relatively easy. You can try scaling any database in two ways: sharding and replication. Sharding means splitting up the data between different instances, by say geographic location. That way you have a database for the U.S. and another database for Europe. Replication means you copy the same data between different database instances.  

There are also issues on consistency of your data and the ability to read it(availability).

The book takes you through all the steps that are required to understand the logic behind the NoSQL databases. It also lists the 4 types of those: key-value storages(AWS DynamoDB), document oriented(MongoDB), column-based(Cassandra) and Graph databases.

"NoSQL distilled" gives you a practical understanding of what terms the databases are thinking about it's data in, what types of sharding and replication are possible for each of them, how the transactions are supported. It gives you a guide how you can choose a database for your project, and even how you can mix those in one solution. It also give hints on data migration.

What I liked and what not

I really liked the book: it is well-written in a manner that complex ideas are well explained. It is highly practically applicable: it would have helped me a lot a year ago when I was starting a new project and struggled with database choice. The book also gives you examples from real life, like designing a system for booking a room in the hotels across the world.

I would love to have some improvements made though. For example, the write quorums, read quorums and replica coefficient could be explained in a more detailed manner.

Other then that, I enjoyed the book a lot.

Recommendations

Solution Architects. This book is an absolute must.

Software Engineers. The book will help to understand the limitations of the technologies you use and design the components of the system properly.

System Engineers. This book is an absolute must as you might need to design how to replicate the data and how your data storage supports it.

Rating

I am giving the book 5 out of 5 for practical applicability and clearness of the text.

If you liked the review, be sure to Subscribe for future updates as I plan to cover other books.