Graphite at scale with Cassandra

Once again, I find myself with a Graphite scaling problem to solve.  After a few iterations of the traditional chained carbon-relay with replication and consistent-hashing approach, I ran in to the end of sanity with cluster growth taking more than 6 days per node added to re-sync the consistent hash.

I’ve been in the weeds with this for a while, but finally have a design that works in production:

Cyanite Graphite

Components

Metric Submission

carbon-c-relay receives metrics from submitters using the graphite protocol.  The blackhole and rewrite features are useful for filtering metrics and fixing up metric names.

cluster cyanite any_of 192.0.2.1 19.2.0.2.2 ;
match ^servers\..*\.cpu\.cpu([0-9]+) send to blackhole ;
match * send to cyanite ;

The cyanite cluster receives from carbon-c-relay and writes data points into Cassandra, using ElasticSearch as the metric path store so that Cyanite can remain stateless and still search wildcard metric paths across Cyanite hosts that have not seen certain metrics.

Metric Retrieval

Cyanite provides an http interface for searching paths (passed through to ElasticSearch) and retrieving metrics.  The graphite-api project has a plugin graphite-cyanite that allows the API host to read metrics via Cyanite.

Grafana requires access to ElasticSearch directly, so if you expose it publicly you will need to add basic authentication to it, for example using an Nginx proxy.  There’s an ElasticSearch article and a ServerFault question on the topic.

Maintenance

Cyanite is new, so is still missing APIs for deletion and pruning of metrics.  I wrote cyanite-utils to work similarly to the carbonate utils for graphite.  For example, to prune all metrics that have not been updated in the last 3 days:

cyanite-list | cyanite-prune | cyanite-delete

Closing

Will follow up later with some performance numbers once I can release them.  For the foreseeable future I no longer have a graphite scaling problem, just a Cassandra scaling one.

3 thoughts on “Graphite at scale with Cassandra

  1. Pingback: graphite のバックエンド(carbon-cache と whisper)を Cassandra に置き換える cyanite を試したメモ(1) | cloudpack技術情報サイト

  2. Pingback: Talking about time series metrics with Cassandra | WrathOfChris

Leave a comment