Monthly Archives: December 2014

Cassandra Field Notes

December 2014

Observations I have made while scaling up Cassandra for time-series data.

On Versions:

  • Cassandra 2.0 series is where to be right now.
  • 2.0.11 was recently released with experimental DateTieredCompactionStrategy which works very well for time series data

On sizing in general:

  • Never use less than 4 CPUs per node for production – compression, compaction and encryption consume many cycles.
  • Move to 8 CPUs per node as soon as feasible, growing a loaded cluster with 4 CPUs takes great patience.

On Sizing in EC2:

  • i2.2xlarge is the sweet spot – enough CPU and ephemeral storage to support rapid growth.  Be sure to bump account limits!
  • m3.2xlarge is a great starting place for production loads – scale wide fast, then scale up.
  • i2.xlarge is underpowered both in CPU and network for the amount of storage it provides.
  • m3.xlarge fits new and unknown projects nicely.
  • Avoid c3.2xlarge – the CPU:Memory ratio is too high, and 8 concurrent compactions may consume the entire NewGen heap space.

On Compaction Strategies for time series data:

  • DateTiered (DTCS) is experimental, but ideal.  Experiments look good so far, but this is a very new feature.
  • SizeTiered (STCS) is the default compaction strategy, but TTLs and tombstones accumulate in larger levels and may rarely be purged without manual compaction.  Never let your storage usage go above 50% or you will have a bad week.
  • Levelled Compaction (LCS) is a good option for sparsely updated TTL’d data, however for workloads where all partitions are updated frequently, the rewrite rate rapidly swamps I/O capacity.

On JVM heaps and Garbage Collection:

  • Export JVM metrics early.  Coda Hale’s metrics package will output to graphite.
  • Choose instance sizes with enough memory for an 8GB (or near enough) heap.  This means >30GB.
  • Watch your ParNew and CMS times, anything over a few hundred milliseconds will impact queries.  Over a second and you will start seeing hinted handoffs during the GC pauses.
  • Be careful with over-tuning – increasing buffer sizes may put pressure on the default heap size ratios.  For example, raising in_memory_compaction_limit_in_mb for larger rows may consume large amounts of NewGen space with concurrent compactions.

On EC2 specific implementations:

  • If using AutoScale Groups, disable the AZRebalance process to avoid inadvertently terminating live instances due to AZ imbalance.
  • Do not scale up by more than +1 desired_capacity every 2 minutes.  Cassandra’s Gossip protocol requires time for a shadow round to complete before the next node can join.
  • When using Ec2MultiRegionSnitch remember that the node must be able to reach all other nodes (and itself!) via its external, public IP address.  Security group limitations apply.

These are my observations in my environment.  Test, or adopt at your own risk.

Advertisement

Graphite at scale with Cassandra

Once again, I find myself with a Graphite scaling problem to solve.  After a few iterations of the traditional chained carbon-relay with replication and consistent-hashing approach, I ran in to the end of sanity with cluster growth taking more than 6 days per node added to re-sync the consistent hash.

I’ve been in the weeds with this for a while, but finally have a design that works in production:

Cyanite Graphite

Components

Metric Submission

carbon-c-relay receives metrics from submitters using the graphite protocol.  The blackhole and rewrite features are useful for filtering metrics and fixing up metric names.

cluster cyanite any_of 192.0.2.1 19.2.0.2.2 ;
match ^servers\..*\.cpu\.cpu([0-9]+) send to blackhole ;
match * send to cyanite ;

The cyanite cluster receives from carbon-c-relay and writes data points into Cassandra, using ElasticSearch as the metric path store so that Cyanite can remain stateless and still search wildcard metric paths across Cyanite hosts that have not seen certain metrics.

Metric Retrieval

Cyanite provides an http interface for searching paths (passed through to ElasticSearch) and retrieving metrics.  The graphite-api project has a plugin graphite-cyanite that allows the API host to read metrics via Cyanite.

Grafana requires access to ElasticSearch directly, so if you expose it publicly you will need to add basic authentication to it, for example using an Nginx proxy.  There’s an ElasticSearch article and a ServerFault question on the topic.

Maintenance

Cyanite is new, so is still missing APIs for deletion and pruning of metrics.  I wrote cyanite-utils to work similarly to the carbonate utils for graphite.  For example, to prune all metrics that have not been updated in the last 3 days:

cyanite-list | cyanite-prune | cyanite-delete

Closing

Will follow up later with some performance numbers once I can release them.  For the foreseeable future I no longer have a graphite scaling problem, just a Cassandra scaling one.