The relational databases were very popular for the last 30 years, providing the ACID guarantees. However, the necessity for the distributed data comes into a place and it appeared the RDBMS solutions are not easily scaled. So NoSQL databases emerged. During my growth into a Solution Architect role I wanted to improve my understanding of the different NoSQL solutions, so I am comparing the top 5 NoSQL dbs here. Hope it will be useful for you as well.
According to this document the major 5 are Apache Cassandra, MongoDB, Neo4j, HBase and OrientDB. Let’s see what’s up with them.
Cassandra is an incredibly scalable solution proven to work with Apple, Netflix, eBay as the biggest production deployments. It is a column based database, so it’s highly available avoiding a single point of failure and has rich clients for JVM based languages. However, it has limited support for data aggregation. The reason for it is that Cassandra is a key-value storage in a nutshell, so doing operations like SUM, AVG, MIN, MAX etc. are hardly achievable.
Mongo is a Document Based database, which basically means it stores entities as JSON documents inside the collections. Those documents are essentially schema less, though you can leverage the validation if required. This way you can partition your database easily which allows for scalability. Mongo is also performant allowing for processing hundreds of thousands records within seconds. To the downsides, you can’t join records on the database level(only application one) and your document nesting ability is limited.
Another approach to NoSQL is graph databases and Neo4j is one of them. The good news is it provides ACID guarantees and able to provide high availability at the same time. Unfortunately, you can’t shard it. Neo4j as a graph database is a recommended choice for social networks and recommendation engines.
HBase is famous more as a db for Hadoop, but an ecosystem for bigdata. HBase is a distributed, column-based NoSQL database designed for BigTable(an offering of Google Cloud Platform). Hadoop is highly available due to new architecture introduced in version 3, migrating from a single NameNode to multiple. It is also very performant to a distributed nature and incredibly scalable due to being architectured with horizontal scalability in mind. Cons of HBase is absence of transactions, no SQL support, indexing and sorting are limited to a key only.
OrientDB is a curios option because it combines both the graph and document NoSQL approaches. If you have a complex and unstructured data and you need the flexibility of document storage and complexity of a graph one, OrientDB may fit your needs. Orient positions itself as a multi model database, which is primarily NoSQL of course but a) has it’s own sql like syntax with graphs support and b) provides a JDBC driver for smooth integration with the apps. As a graph database it doesn’t use the joins but stores the relationships right inside the vertices. This enables it to be a great fit for recommendation engines, social media and fraud detection systems.
This is a brief description of only 5 NoSQL solutions; obviously there are many more out there, but it is a material for another article :)