Observations I have made while scaling up Cassandra for time-series data.
- Cassandra 2.0 series is where to be right now.
- 2.0.11 was recently released with experimental DateTieredCompactionStrategy which works very well for time series data
On sizing in general:
- Never use less than 4 CPUs per node for production – compression, compaction and encryption consume many cycles.
- Move to 8 CPUs per node as soon as feasible, growing a loaded cluster with 4 CPUs takes great patience.
On Sizing in EC2:
- i2.2xlarge is the sweet spot – enough CPU and ephemeral storage to support rapid growth. Be sure to bump account limits!
- m3.2xlarge is a great starting place for production loads – scale wide fast, then scale up.
- i2.xlarge is underpowered both in CPU and network for the amount of storage it provides.
- m3.xlarge fits new and unknown projects nicely.
- Avoid c3.2xlarge – the CPU:Memory ratio is too high, and 8 concurrent compactions may consume the entire NewGen heap space.
On Compaction Strategies for time series data:
- DateTiered (DTCS) is experimental, but ideal. Experiments look good so far, but this is a very new feature.
- SizeTiered (STCS) is the default compaction strategy, but TTLs and tombstones accumulate in larger levels and may rarely be purged without manual compaction. Never let your storage usage go above 50% or you will have a bad week.
- Levelled Compaction (LCS) is a good option for sparsely updated TTL’d data, however for workloads where all partitions are updated frequently, the rewrite rate rapidly swamps I/O capacity.
On JVM heaps and Garbage Collection:
- Export JVM metrics early. Coda Hale’s metrics package will output to graphite.
- Choose instance sizes with enough memory for an 8GB (or near enough) heap. This means >30GB.
- Watch your ParNew and CMS times, anything over a few hundred milliseconds will impact queries. Over a second and you will start seeing hinted handoffs during the GC pauses.
- Be careful with over-tuning – increasing buffer sizes may put pressure on the default heap size ratios. For example, raising in_memory_compaction_limit_in_mb for larger rows may consume large amounts of NewGen space with concurrent compactions.
On EC2 specific implementations:
- If using AutoScale Groups, disable the AZRebalance process to avoid inadvertently terminating live instances due to AZ imbalance.
- Do not scale up by more than +1 desired_capacity every 2 minutes. Cassandra’s Gossip protocol requires time for a shadow round to complete before the next node can join.
- When using Ec2MultiRegionSnitch remember that the node must be able to reach all other nodes (and itself!) via its external, public IP address. Security group limitations apply.
These are my observations in my environment. Test, or adopt at your own risk.