Cassandra

The Apache Cassandra Project (cassandra.apache.org) develops a highly scalable second-generation distributed database, bringing together Dynamo’s fully distributed design and Bigtable’s ColumnFamily-based data model.

CQL

Cassandra originally went with a Thrift RPC-based API as a way to provide a common denominator that more idiomatic clients could build upon independently. However, this worked poorly in practice: raw Thrift is too low-level to use productively, and keeping pace with new API methods to support (for example) indexes in 0.7 or distributed counters in 0.8 is too much for many maintainers.

CQL, the Cassandra Query Language, addresses this by pushing all implementation details to the server; all the client has to know for any operation is how to interpret “resultset” objects. So adding a feature like counters just requires teaching the CQL parser to understand “column + N” notation; no client-side changes are necessary.

(CQL Specification: github.com/apache/cassandra/blob/trunk/doc/cql/CQL.textile)

Quick Start

Establishing a connection

# Defaults to the system keyspace
db = CassandraCQL::Database.new('127.0.0.1:9160')

# Specifying a keyspace
db = CassandraCQL::Database.new('127.0.0.1:9160', {:keyspace => 'keyspace1'})

# Specifying more than one seed node
db = CassandraCQL::Database.new(['127.0.0.1:9160','127.0.0.2:9160'])

Creating a Keyspace

# Creating a simple keyspace with replication factor 1
db.execute("CREATE KEYSPACE keyspace1 WITH strategy_class='org.apache.cassandra.locator.SimpleStrategy' AND strategy_options:replication_factor=1")
db.execute("USE keyspace1")

Creating a Column Family

# Creating a column family with a single validated column
db.execute("CREATE COLUMNFAMILY users (id varchar PRIMARY KEY, email varchar)")

# Create an index on the name
db.execute("CREATE INDEX users_email_idx ON users (email)")

Inserting into a Column Family

# Insert without bound variables
db.execute("INSERT INTO users (id, email) VALUES ('kreynolds', '[email protected]')")

# Insert with bound variables
db.execute("INSERT INTO users (id, email) VALUES (?, ?)", 'kway', '[email protected]')

Updating a Column Family

# Update
db.execute("UPDATE users SET email=? WHERE id=?", '[email protected]', 'kreynolds')

Selecting from a Column Family

# Select all
db.execute("SELECT * FROM users").fetch { |row| puts row.to_hash.inspect }
  {"id"=>"kway", "email"=>"[email protected]"}
  {"id"=>"kreynolds", "email"=>"[email protected]"}

# Select just one user by id
db.execute("SELECT * FROM users WHERE id=?", 'kreynolds').fetch { |row| puts row.to_hash.inspect }
  {"id"=>"kreynolds", "email"=>"[email protected]"}

# Select just one user by indexed column
db.execute("SELECT * FROM users WHERE email=?", '[email protected]').fetch { |row| puts row.to_hash.inspect }
  {"id"=>"kreynolds", "email"=>"[email protected]"}

Deleting from a Column Family

# Delete the swarthy bastard Kevin
db.execute("DELETE FROM users WHERE id=?", 'kway')