Contents

Monitoring Survey 2015 - Metrics

In the last posts I talked about the tools people used in monitoring, the demographics, and what environments people monitor. In this post I am going to look at the questions around collecting metrics and what those metrics are used for by respondents.

As I’ve mentioned in previous posts, the survey got 1,116 responses of which 884 were complete.

This post will cover the questions:

7. Do you collect metrics on your infrastructure and applications?
8. What tools do you use to collect metrics?
9. What tools do you use to store your metrics?
10. What tools do you use to visualize your metrics?
11. If you collect metrics, what do you use the metrics you track for?

Collecting Metrics

Question 7 asked if the respondents collected metrics. It was a Yes/No question.

/images/posts/2015/7/metcoll.png

We can see that the overwhelming majority, 88% in fact, of respondents collect metrics (slightly down from 90% last year). That continues to be a pretty conclusive indication that metrics matter.

I also broke the responses down by organization size. I was curious to see what size organizations collected the least metrics.

/images/posts/2015/7/metorg.png

We can see that there a pretty even distribution of people that do not collect metrics across organization size.

Metric collection tools

I also asked respondents to tell me about the tools they used to collect metrics. There was a choice of potential tools and an Other option. The choice of tools included:

  • collectd
  • Cube
  • DataDog
  • Ganglia
  • Librato
  • Munin
  • New Relic
  • OpenTSDB
  • StatsD

/images/posts/2015/7/metcoltool.png

We can see that both collectd and StatsD are heavily used with New Relic coming in third, in keeping with the data revealed in the tool analysis results.

The results of the Other question was also interesting. I’ve only included tools that occurred more than once to keep the list manageable.

Metrics collection tools - Other #
In-house 77
Diamond 26
Sensu 23
Zabbix 19
ELK 17
Cacti 16
Nagios 13
Check_MK 13
Centreon 11
pnp4nagios 9
Splunk 9
SolarWinds 8
AppDynamics 7
Prometheus 6
Icinga2 6
NetCrunch 6
Shinken 5
Zenoss 5
jmxtrans 5
DropWizard 4
Observium 4
Dataloop 4
OpenNMS 4
Riemann 3
Coda’s Metrics 3
Cloudwatch 2
OMD 2
Dynatrace 2
Smokeping 2
Graphite 2
Stackdriver 2
Xymon 2
CopperEgg 2
Ganglia 2
LogicMonitor 2
SignalFX 2

The high number respondents building their own metrics collection tools (77 reported having in-house tooling) is interesting. It potentially suggests that there is still a segment of the market that isn’t happy with the available tooling out there.

Also interesting was the support for Diamond, a Python-based metrics collection tools originally written by the Brightcove team and now maintained as a separate open source project.

Metric storage tools

We also asked respondents to name the tools they used to store metrics. The options for the question included:

  • DataDog
  • Graphite
  • Hosted Graphite
  • InfluxDB
  • Librato
  • OpenTSDB
  • RRDtool

There was also an Other option we’ll report below.

/images/posts/2015/7/metstotool.png

The clear winner here is Graphite. As one of the longer standing tools in the metrics space it’s not overly surprising it is so well represented. Also present in large numbers is RRDTool, an even older tool in the metric’s space. The newer generation of tools is represented by InfluxDB.

These are the responses to the Other option. I’ve only included tools that occurred more than once to keep the list manageable.

Metrics storage tools - Other #
ELK 28
In-house 27
Splunk 14
Zabbix 14
New Relic 9
MySQL 8
Prometheus 8
Cacti 8
SignalFX 7
AppDynamics 6
NetCrunch 6
Dataloop 5
SolarWinds 5
Stackdriver 4
Zenoss 4
Cassandra 4
CopperEgg 3
MSSQL 3
Ganglia 3
postgreSQL 2
Circonus 2
LogicMonitor 2
Check_MK 2
pnp4nagios 2
SPM 2
OpenNMS 2
kairosdb 2
Xymon 2
Redis 2

Interesting to note here is the people using the ELK stack and in-house tools to store their metric data. I’ve been seeing a lot of tools and services converting data and metrics into Logstash’s JSON format and using Logstash as a filtering router and Elasticsearch as storage.

Metric visualization tools

Our last question focussed on metrics visualization tools.

Respondents had a choice of the following tools:

  • D3
  • Grafana
  • Graphene
  • Graphite
  • Highcharts
  • Rickshaw
  • Tessera

Respondents could also select an Other option and specify other tools.

/images/posts/2015/7/metvistool.png

Here Grafana is a clear favorite. Likely given its ability to sit on top of Graphite, InfluxDB and OpenTSDB. The next largest tool was Graphite itself and then, with a long drop-off, the D3 Javascript framework.

These are the responses to the Other option. I’ve only included tools that occurred more than once to keep the list manageable.

Metrics Visualization tools - Other #
In-house 54
ELK 35
pnp4nagios 27
DataDog 24
Cacti 22
Zabbix 17
Splunk 13
Munin 13
New Relic 10
Ganglia 8
Observium 7
Librato 7
NetCrunch 7
Centreon 6
AppDynamics 6
SolarWinds 6
Dataloop 5
RRDTool 5
Dashing 5
OpenNMS 5
SignalFX 4
Stackdriver 4
Promdash 4
Check_MK 4
MRTG 3
pnp 3
Nagios 3
Circonus 3
Graphite 3
Tableau 3
CopperEgg 3
Xymon 3
Metrilyx 2
Riemann 2
Zenoss 2
LogicMonitor 2
SPM 2
Nagiosgraph 2
OpenTSDB 2
StatusWolf 2
Visage 2

Again present are a lot of in-house tools and the ELK stack in the form of Kibana. Given the presence of lots of Nagios users it’s also not a surprise to see pnp4nagios represented.

The purpose of metrics collection

I also asked respondents why they collected metrics. As with last year I was curious whether respondents were collecting data for performance analysis or as a fault detection tool. There’s a strong movement in more modern monitoring methodologies to consider metrics a fault detection tool in their own right. I was interested to see if this thinking had grown from last year.

Respondents were able to select one or more choice from the list of:

  • Performance analysis and trending
  • Fault and Anomaly detection
  • Capacity Planning
  • A/B Testing
  • We don’t do anything with collected metrics
  • Other

If respondents selected “No”, that they did not collect metrics, the previous question logic skipped them to the next question.

I’ve produced a summary table of respondents and their selections.

Metrics Purpose %
Performance analysis and trending 63%
Fault and Anomaly detection 53%
Capacity Planning 45%
A/B Testing 11%
We don’t do anything with collected metrics 3%

We have see that 63% of respondents specified performance analysis and trending as a reason for collecting metrics. Below that 53% of respondents specified that they used metrics for Fault and anomaly detection. This is 10% lower than last year’s survey. The next largest group, 45%, used metrics for capacity planning.

A very small group, 11%, used metrics for A/B testing.

I also summarized the Other responses as a table

Metrics Purpose - Other #
Reporting 5
Dashboards 4
Alerting 3
Business KPIs 2
Slow call traces 1
Marketing 1
Retrospectives 1
Power management 1
Fault diagnosis 1
Incident response 1
Billing 1

P.S. I am also writing a book about monitoring.

The posts: