May 20, 2012

Apache Cassandra Database

Greetings.

Note: I like innovative products that solve customer problems. Cassandra is an innovative product. So I like it. The summary of this blog is that "Cassandra is yet another database" with different design principles.


Caution: If you are having data-scaling problems in your application, blindly moving to Cassandra is NOT going to solve it for you!. It's a 1.0 product and you should evaluate it carefully with your needs before you commit to it.

Database

The title of this blog may be confusing to you, if you have "read-too-much" about NoSQL. Let me tell you- Cassandra is a database. Period. There is nothing wrong in calling it as database. I would even say that calling Cassandra as NoSQL is even insulting to it's capabilities. It has richer + sophisticated data structures for applications.

What is database? 
Database is a piece of software that organizes data and makes it available for applications.

There are many types of database and RDBMS is *just* one of them. Unfortunately/fortunately, RDBMS was so successful in the past for a very long time(3 decades), the name database was hijacked to refer only to RDBMS.

So, what is Cassandra?
Oracle  is a "row-oriented" database.
Cassandra is a "column-oriented" database.

All I am trying to say here is that Cassandra is a database and there is nothing wrong calling it a database!

NoSQL

To build any serious application, you need query language. You can call them what ever you like but you need it. You can't build app against *just* key-value store. Cassandra is not just key-value store. In the same context, Redis is also not just key-value store. I "think" MongoDB is also not just key-value store. I read it as document-oriented database and I haven't had a chance to look at it closely.

You see here- all 3 popular so-called NoSQL are all not just key-value stores. This is to get the fact straight.  Cassandra supports SQL! Oh,yes, it's called CQL. I have no problem with it. I like Cassandra SQL. Let me highlight 3 CQL here-


CREATE TABLE Fish (KEY blob PRIMARY KEY);
SELECT * from People;
INSERT INTO NerdMovies (KEY, 11924)
      VALUES ('cfd66ccc-d857-4e90-b1e5-df98a3d40cd6', 'johndoe')
    USING CONSISTENCY LOCAL_QUORUM AND TTL 86400;


You can read the rest here:
http://www.datastax.com/docs/1.0/references/cql/index

All I am trying to say here is that Cassandra is a database and there is nothing wrong calling it a database!

Multi-Tenant

Can you use Cassandra as a primary database to build multi-tenant SaaS applications? 
IMO, the short answer is "no". 

The long answer is that you need to use OPP (order preserving partitioner) to support the range queries for multi-tenant applications with composite keys. Datastax documentation clearly says why it's a bad idea to use OPP. Sure, there may be work-around and I don't want to design an application against work-around! Drop me an email, if you want to know more about my research here. (uday.s@comcast.net)

Side note
a) Cassandra is a 1.0 product.
     It is evolving rapidly and you will see inconsistency between pre/post 0.8.

b) Documentation is challenging. 
    Your friends are Datastax, StackOverflow and blogs. Be ready to spend time to research for information and don't get frustrated easily.

c) Cassandra is written in Java! 
    I saw inconsistency between cli and server. I have to reboot it. I saw Java exceptions on the server. Even though I spend most of the last decade @Sun/Java, I am not a fan of databases written in Java. You need to be close to metal when it comes to databases.

Summary

Apache Cassandra is a "column oriented database" written in Java and has a good promise.