Introduction to Apache Cassandra

YOUTUBE B_HTdrTgGNs Patrick McFadin presents his talk in 2016

Apache Cassandra tries to maximise uptime and is based on some really good white paper published by Cornel University.

McFadin argues that Casandra was a blend of The Dynamo Paper by Amazon, and Bigtable by Google. It took the distributed nature of Dynamo and built the data model using bigtable's storage engine portion combining them into a highly available data store with a rich data model.

Cassandra does sequential writes, and doesn't rearrange data till it runs a routine called Compaction. It'll pick the latest write when duplicates exist.

Compaction uses a memtable to create a mergesort for the data on disk, then it goes and writes that out again. So Cassandra is trying to get fast read and writes now, and will fix efficiencies later as part o this process.