Using Riemann for Metrics
In my first post I introduced you to Riemann and my second post discussed Riemann for fault detection. In those posts we’ve discovered that Riemann aggregates events from distributed hosts and services. One of the cool outcomes of this aggregation is the ability to generate metrics from the events. We can then use a tool like Graphite to store the metric data and render graphs from it. In this post you’ll see how to:
- Install Graphite.
- Generate metrics.
- Integrate Riemann with Graphite.
Installing Graphite
The first step we’re going to take is to install Graphite. Graphite is an engine that stores time-series data and then renders graphs from that data.
On an Ubuntu 14.04 or later host Graphite is available from APT packages. It’s made up of three components:
- A web interface.
- A storage engine called Carbon.
- A database library called Whisper.
Carbon also relies on a database backend. The default database is Sqlite3 but you can specify Postgresql or MySQL/MariaDB if you wish (and I recommend one of these for a production environment - they are both far more robust than the default). We’re going to stick with the default right now as we’re just testing.
Installing Packages
Let’s install the packages we need.
|
|
We’ve first updated our APT package cache and then we’ve installed the graphite-web
and graphite-carbon
packages. The graphite-web
package contains Graphite’s web interface and the graphite-carbon
package contains the Carbon storage engine. We’ve also installed Apache to run the Graphite web interface.
You’ll be prompted during installation as to whether your graph database should be removed if you uninstall Graphite. Answer “No” to ensure your graph data is preserved.
Configuring Graphite
Next we need to configure Graphite. First we edit the /etc/graphite/local_settings.py
configuration file.
|
|
We need to change two items in this file. The first, SECRET_KEY
, is used to salt hashes for Graphite’s authentication and the second, TIME_ZONE
, controls the time zone. The latter is important if you want your metrics to have the right time and date.
We want to uncomment SECRET_KEY
and set it to a long random string. Let’s generate a string now.
|
|
Now let’s add this random string to SECRET_KEY
and uncomment and update our TIME_ZONE
setting inside /etc/graphite/local_settings.py
.
|
|
Later in the same file you’ll find a hash of database settings.
|
|
For the default Sqlite3 database you won’t need to change this but it’d be here that you’d update if you wanted to use Postgresql or MySQL. In the default configuration you’ll find your data stored in /var/lib/graphite/graphite.db
.
Prepping our database
Next we find to prep our initial database using the syncdb
option of the graphite-manage
command. This populates our database with the required initial tables and structure.
|
|
We also define a super-user to use with our database. I specify the default root
, an email address and then a secure password.
Configuring Carbon
Next I want to tweak Carbon’s density of metric retention, essentially how long metrics should be stored and how detailed those metrics should be. This is configured in the /etc/carbon/storage-schemas.conf
file. Let’s look at this file now.
|
|
Each schema entry matches specific metrics by name and specifies one or more retention periods. The first entry, [carbon]
, manages Carbon’s own metrics. A regular expression pattern
is matched to find these, here any metric starting with carbon
. The retentions are then set with the retentions
entry. You can specify one or more retentions in the form of:
|
|
For the Carbon metrics a data point is created every 60 seconds and kept for 90 days: 60:90d
. This means each data point represents 60 seconds and we want to keep enough data points for 90 days of data.
All other metrics use the default_1min_for_1day
schema, the pattern
matches .*
or all events. In this schema, Graphite creates data points every 60 seconds and keeps enough data to represent 1 day. That’s a pretty low resolution by most standards and Riemann processes events much more quickly. So we’re going to create a new schema and comment out the old one.
|
|
This new schema includes multiple retentions. Multiple retentions allow graceful downsampling of historical data, saving you disk and performance. Our first retention, 10s:1h
creates data points every 10 seconds and keeps enough data for 1 hour and then our next retention, 1m:7d
, retains 1 minute data points for 7 days and so on.
To do the downsample from 10s:1h
to 1m:7d
Graphite gathers all of the data from the past minute (this should be six data points, one generated every 10 seconds). It then averages the data points to aggregate them and retains this new data point for 7 days. By default, each retention averages the total as it downsamples so you can determine metrics totals by reversing the average.
You can also configure Graphite to use alternate methods to aggregate the data points including min
, max
, sum
and last
. This is done by configuring a /etc/carbon/storage-aggregation.conf
file. There’s a sample file in /usr/share/doc/graphite-carbon/examples/storage-aggregation.conf.example
. We’re not going to do that right now but there’s an annoyingly frequent log message that appears in your Carbon logs, /var/log/carbon/console.log
:
|
|
Creating an empty /etc/carbon/storage-aggregation.conf
file stops the message so let’s do that now.
|
|
You can see a lot more about how Carbon is configured here.
Run Carbon at startup
Now let’s configure Carbon to run by default by editing the /etc/default/graphite-carbon
file.
|
|
Change the value of CARBON_CACHE_ENABLED=false
to CARBON_CACHE_ENABLED=true
.
Installing Graphite’s web interface
As our last setup step we’re going to install Graphite’s web interface. To do this we’re going to install it as Apache’s default website. First, disable the existing default site.
|
|
Now copy in Graphite’s Apache configuration.
|
|
And enable it.
|
|
And we’re done.
Starting Carbon and Graphite
Finally, let’s start or reload the required services.
First Carbon.
|
|
And then Apache.
|
|
You can then view the Graphite web interface in your browser.
Configuring Riemann for Graphite
Riemann uses a Clojure-based configuration file to specify how events are processed and handled. On an Ubuntu host we can find that file at /etc/riemann/riemann.config
. We’re going to add a Graphite output to the configuration we used in the last posts on Riemann. Let’s look at an updated configuration now.
|
|
You can see we’ve added a function called graph
.
|
|
This defines a connection to our local Graphite server, here localhost
. You could also specify the name of a remote Graphite server and you can use either TCP or UDP to send events.
Inside your streams
block we can then use the graph
function to send events through to Graphite. In our current configuration we’re graphing everything. This means every event sent to Riemann will get passed to Graphite and turned into a graph.
Alternatively, if you don’t want to send everything to Graphite we can be more selective, for example we could only select metrics from specific services.
|
|
Here we’re only sending events from the heartbeat
service through to Graphite.
Now let’s send some metrics through to Graphite.
Sending metrics to Riemann and Graphite
For our metrics we’re going to choose some Nginx metrics. We’ve got a host running Nginx and are going to use the riemann-nginx-status
command provided by the riemann-tools
gem to send the metrics.
|
|
The riemann-nginx-status
command assumes the presence of an Nginx status page at http://localhost:8080/nginx_status
. You can configure a page like that in your Nginx configuration. You can also override the default location with the --uri
option.
|
|
Nginx status stub provides connection and status metrics. You can also control which metrics get sent to Riemann and specify any required thresholds. Let’s run riemann-nginx-status
now.
|
|
We’re sending our metrics from our Nginx host to riemann.example.com
and we should start to see events like these hit Riemann shortly:
|
|
Here we have a health check and the active connections metric. We should also now see if these events passed through to Graphite. Let’s see the resulting graphs in the Graphite web console.
We can see several metrics in our graph but not our health event. This is because Riemann only forwards events that have metrics. As the health event has a metric
value of nil
it’s not forwarded along to Graphite.
Pretty simple eh? Instant graph gratification.
Summary
We’ve seen how to install Graphite and connect it to Riemann. We’ve also seen how easy it is to turn our metrics into useful graphs. Building on this we could easily add categorization, filtering and manipulation (you remember all those cool things Riemann can do to events and their contents). A good starting point is The Guardian’s Riemann configuration. There’s lots of useful examples and ideas here. Enjoy!