Contents

Using Riemann for Metrics

In my first post I introduced you to Riemann and my second post discussed Riemann for fault detection. In those posts we’ve discovered that Riemann aggregates events from distributed hosts and services. One of the cool outcomes of this aggregation is the ability to generate metrics from the events. We can then use a tool like Graphite to store the metric data and render graphs from it. In this post you’ll see how to:

  1. Install Graphite.
  2. Generate metrics.
  3. Integrate Riemann with Graphite.

Installing Graphite

The first step we’re going to take is to install Graphite. Graphite is an engine that stores time-series data and then renders graphs from that data.

On an Ubuntu 14.04 or later host Graphite is available from APT packages. It’s made up of three components:

  • A web interface.
  • A storage engine called Carbon.
  • A database library called Whisper.

Carbon also relies on a database backend. The default database is Sqlite3 but you can specify Postgresql or MySQL/MariaDB if you wish (and I recommend one of these for a production environment - they are both far more robust than the default). We’re going to stick with the default right now as we’re just testing.

Installing Packages

Let’s install the packages we need.

1
2
$ sudo apt-get update
$ sudo apt-get -y install graphite-web graphite-carbon apache2 libapache2-mod-wsgi

We’ve first updated our APT package cache and then we’ve installed the graphite-web and graphite-carbon packages. The graphite-web package contains Graphite’s web interface and the graphite-carbon package contains the Carbon storage engine. We’ve also installed Apache to run the Graphite web interface.

You’ll be prompted during installation as to whether your graph database should be removed if you uninstall Graphite. Answer “No” to ensure your graph data is preserved.

Configuring Graphite

Next we need to configure Graphite. First we edit the /etc/graphite/local_settings.py configuration file.

1
$ vi /etc/graphite/local_settings.py

We need to change two items in this file. The first, SECRET_KEY, is used to salt hashes for Graphite’s authentication and the second, TIME_ZONE, controls the time zone. The latter is important if you want your metrics to have the right time and date.

We want to uncomment SECRET_KEY and set it to a long random string. Let’s generate a string now.

1
2
$ cat /dev/urandom | tr -dc 'a-zA-Z0-9' | fold -w 256 | head -1
SyN1cmnVFCOvHhKJ4Jxrfc5osJx5HNmOc60LVEFahYM0dusIYmCRndd2mFEfHi6WAf9Sv8xBksmsmdQSh6PcoBKhA0MeX6DMNszKZEyGTBpx3kU5AArbcAtoeyTHz6ROk25DSKmjw7MlbmVVuM5Nbf5ewCIl6OVN3iXDhPLX0wvkE7nKJHKDcqelIOR0EyXDoa25Z88W374TXVNSucpxlyLDXWhHP6XShXCza4EQKCu6GePvFLHl1pjpYrb4sv7J

Now let’s add this random string to SECRET_KEY and uncomment and update our TIME_ZONE setting inside /etc/graphite/local_settings.py.

1
2
SECRET_KEY='SyN1cmnVFCOvHhKJ4Jxrfc5osJx5HNmOc60LVEFahYM0dusIYmCRndd2mFEfHi6WAf9Sv8xBksmsmdQSh6PcoBKhA0MeX6DMNszKZEyGTBpx3kU5AArbcAtoeyTHz6ROk25DSKmjw7MlbmVVuM5Nbf5ewCIl6OVN3iXDhPLX0wvkE7nKJHKDcqelIOR0EyXDoa25Z88W374TXVNSucpxlyLDXWhHP6XShXCza4EQKCu6GePvFLHl1pjpYrb4sv7J'
TIME_ZONE = 'America/New_York'

Later in the same file you’ll find a hash of database settings.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
DATABASES = {
  'default': {
    'NAME': '/var/lib/graphite/graphite.db',
    'ENGINE': 'django.db.backends.sqlite3',
    'USER': '',
    'PASSWORD': '',
    'HOST': '',
    'PORT': ''
  }
}

For the default Sqlite3 database you won’t need to change this but it’d be here that you’d update if you wanted to use Postgresql or MySQL. In the default configuration you’ll find your data stored in /var/lib/graphite/graphite.db.

Prepping our database

Next we find to prep our initial database using the syncdb option of the graphite-manage command. This populates our database with the required initial tables and structure.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
$ sudo graphite-manage syncdb
Creating tables ...
Creating table account_profile
Creating table account_variable
Creating table account_view
Creating table account_window
Creating table account_mygraph
Creating table dashboard_dashboard_owners
Creating table dashboard_dashboard
Creating table events_event
Creating table auth_permission
Creating table auth_group_permissions
Creating table auth_group
Creating table auth_user_groups
Creating table auth_user_user_permissions
Creating table auth_user
Creating table django_session
Creating table django_admin_log
Creating table django_content_type
Creating table tagging_tag
Creating table tagging_taggeditem

You just installed Django's auth system, which means you don't have any superusers defined.
Would you like to create one now? (yes/no): yes
Username (leave blank to use 'root'):
Email address: james@example.com
Password:
Password (again):
Superuser created successfully.
Installing custom SQL ...
Installing indexes ...
Installed 0 object(s) from 0 fixture(s)

We also define a super-user to use with our database. I specify the default root, an email address and then a secure password.

Configuring Carbon

Next I want to tweak Carbon’s density of metric retention, essentially how long metrics should be stored and how detailed those metrics should be. This is configured in the /etc/carbon/storage-schemas.conf file. Let’s look at this file now.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
# Schema definitions for Whisper files. Entries are scanned in order,
# and first match wins. This file is scanned for changes every 60 seconds.
#
#  [name]
#  pattern = regex
#  retentions = timePerPoint:timeToStore, timePerPoint:timeToStore, ...

# Carbon's internal metrics. This entry should match what is specified in
# CARBON_METRIC_PREFIX and CARBON_METRIC_INTERVAL settings
[carbon]
pattern = ^carbon\.
retentions = 60:90d

[default_1min_for_1day]
pattern = .*
retentions = 60s:1d

Each schema entry matches specific metrics by name and specifies one or more retention periods. The first entry, [carbon], manages Carbon’s own metrics. A regular expression pattern is matched to find these, here any metric starting with carbon. The retentions are then set with the retentions entry. You can specify one or more retentions in the form of:

1
sample_time:retention_period

For the Carbon metrics a data point is created every 60 seconds and kept for 90 days: 60:90d. This means each data point represents 60 seconds and we want to keep enough data points for 90 days of data.

All other metrics use the default_1min_for_1day schema, the pattern matches .* or all events. In this schema, Graphite creates data points every 60 seconds and keeps enough data to represent 1 day. That’s a pretty low resolution by most standards and Riemann processes events much more quickly. So we’re going to create a new schema and comment out the old one.

1
2
3
4
5
6
7
#[default_1min_for_1day]
#pattern = .*
#retentions = 60s:1d

[default]
pattern = .*
retentions = 10s:1h, 1m:7d, 15m:30d, 1h:2y

This new schema includes multiple retentions. Multiple retentions allow graceful downsampling of historical data, saving you disk and performance. Our first retention, 10s:1h creates data points every 10 seconds and keeps enough data for 1 hour and then our next retention, 1m:7d, retains 1 minute data points for 7 days and so on.

To do the downsample from 10s:1h to 1m:7d Graphite gathers all of the data from the past minute (this should be six data points, one generated every 10 seconds). It then averages the data points to aggregate them and retains this new data point for 7 days. By default, each retention averages the total as it downsamples so you can determine metrics totals by reversing the average.

You can also configure Graphite to use alternate methods to aggregate the data points including min, max, sum and last. This is done by configuring a /etc/carbon/storage-aggregation.conf file. There’s a sample file in /usr/share/doc/graphite-carbon/examples/storage-aggregation.conf.example. We’re not going to do that right now but there’s an annoyingly frequent log message that appears in your Carbon logs, /var/log/carbon/console.log:

1
/etc/carbon/storage-aggregation.conf not found, ignoring.

Creating an empty /etc/carbon/storage-aggregation.conf file stops the message so let’s do that now.

1
$ touch /etc/carbon/storage-aggregation.conf

You can see a lot more about how Carbon is configured here.

Run Carbon at startup

Now let’s configure Carbon to run by default by editing the /etc/default/graphite-carbon file.

1
$ sudo vi /etc/default/graphite-carbon

Change the value of CARBON_CACHE_ENABLED=false to CARBON_CACHE_ENABLED=true.

Installing Graphite’s web interface

As our last setup step we’re going to install Graphite’s web interface. To do this we’re going to install it as Apache’s default website. First, disable the existing default site.

1
$ sudo a2dissite 000-default

Now copy in Graphite’s Apache configuration.

1
$ sudo cp /usr/share/graphite-web/apache2-graphite.conf /etc/apache2/sites-available

And enable it.

1
$ sudo a2ensite apache2-graphite

And we’re done.

Starting Carbon and Graphite

Finally, let’s start or reload the required services.

First Carbon.

1
$ sudo service carbon-cache start

And then Apache.

1
$ sudo service apache2 reload.

You can then view the Graphite web interface in your browser.

/images/posts/2015/1/graphite_web.png

Configuring Riemann for Graphite

Riemann uses a Clojure-based configuration file to specify how events are processed and handled. On an Ubuntu host we can find that file at /etc/riemann/riemann.config. We’re going to add a Graphite output to the configuration we used in the last posts on Riemann. Let’s look at an updated configuration now.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
; -*- mode: clojure; -*-
; vim: filetype=clojure

(logging/init {:file "/var/log/riemann/riemann.log"})

; Listen on the local interface over TCP (5555), UDP (5555), and websockets
; (5556)
(let [host "0.0.0.0"]
(tcp-server {:host host})
(udp-server {:host host})
(ws-server  {:host host}))

; Expire old events from the index every 5 seconds.
(periodically-expire 10 {:keep-keys [:host :service :tags :metric]})

(def graph (graphite {:host "localhost"}))

(let [index (index)]
; Inbound events will be passed to these streams:
  (streams
    (default :ttl 60
      ; Index all events immediately.
      index

      ; graph all
      graph)))

You can see we’ve added a function called graph.

1
(def graph (graphite {:host "localhost"}))

This defines a connection to our local Graphite server, here localhost. You could also specify the name of a remote Graphite server and you can use either TCP or UDP to send events.

Inside your streams block we can then use the graph function to send events through to Graphite. In our current configuration we’re graphing everything. This means every event sent to Riemann will get passed to Graphite and turned into a graph.

Alternatively, if you don’t want to send everything to Graphite we can be more selective, for example we could only select metrics from specific services.

1
2
3
(streams
  (where (service "heartbeat")
    graph))

Here we’re only sending events from the heartbeat service through to Graphite.

Now let’s send some metrics through to Graphite.

Sending metrics to Riemann and Graphite

For our metrics we’re going to choose some Nginx metrics. We’ve got a host running Nginx and are going to use the riemann-nginx-status command provided by the riemann-tools gem to send the metrics.

1
$ sudo gem install riemann-tools

The riemann-nginx-status command assumes the presence of an Nginx status page at http://localhost:8080/nginx_status. You can configure a page like that in your Nginx configuration. You can also override the default location with the --uri option.

1
2
3
4
5
6
location /nginx_status {
  stub_status on;
  access_log   off;
  allow 127.0.0.1;
  deny all;
}

Nginx status stub provides connection and status metrics. You can also control which metrics get sent to Riemann and specify any required thresholds. Let’s run riemann-nginx-status now.

1
$ riemann-nginx-status --host riemann.example.com

We’re sending our metrics from our Nginx host to riemann.example.com and we should start to see events like these hit Riemann shortly:

1
2
{:host artemisia.example.com, :service nginx health, :state ok, :description Nginx status connection ok, :metric nil, :tags nil, :time 1421514112, :ttl 10.0}
{:host artemisia.example.com, :service nginx active, :state ok, :description nil, :metric 3, :tags nil, :time 1421514112, :ttl 10.0}

Here we have a health check and the active connections metric. We should also now see if these events passed through to Graphite. Let’s see the resulting graphs in the Graphite web console.

/images/posts/2015/1/graphite_web2.png

We can see several metrics in our graph but not our health event. This is because Riemann only forwards events that have metrics. As the health event has a metric value of nil it’s not forwarded along to Graphite.

Pretty simple eh? Instant graph gratification.

Summary

We’ve seen how to install Graphite and connect it to Riemann. We’ve also seen how easy it is to turn our metrics into useful graphs. Building on this we could easily add categorization, filtering and manipulation (you remember all those cool things Riemann can do to events and their contents). A good starting point is The Guardian’s Riemann configuration. There’s lots of useful examples and ideas here. Enjoy!