A bit of a Vector
I’m always looking at new monitoring and logging tools. I’ve written some books on logging and some that feature logging as part of broader monitoring frameworks.
Recently I’ve been playing around with Vector. Vector describes itself as an observability tool, specifically a “data router”, but has a heavy focus on logs as data sources, although it also supports metrics and has indicated tracing data is a future ambition.
The Vector team see it as a replacement for Logstash, Telegraf, Fluent et al but warn that it is not a distributed stream processing tool (or a replacement for Kafka! hellz) nor that it should be used with analytics-type data.
Vector is written in Rust, because of course it is being the trendy new language for infrastructure tools. It has been released under the Apache 2.0 license by the team at Timber.io as the open source end of their SAAS monitoring tool.
Vector architecture
Vector’s architecture is based around building pipelines. Pipelines take incoming events, perform any transformation you may want on the events, and output them to some destination. Each of these pipelines are made up of three components, articulated here in the progression they are typically used.
- Sources - These are what Vector gets data from, which is subsequently normalized into an event record. Sources include classics like Syslog, pulling from files, or receiving data on a TCP connection.
- Transforms - These modifies events or a stream of events. This could include parsing or filtering events, doing sampling or aggregating events; essentially anything that modifies an event.
- Sinks - Are the final destinations of our events. Each sink sends to a different destination, for example streaming events to a TCP socket or sending events for storage in AWS S3.
If you’re coming from the Logstash world the easiest comparison here would be sources are inputs, transforms are filters, and sinks are outputs.
Let’s take a look at Vector now.
Installing Vector
Vector is available as binary and package installers for a variety of Linux distributions, Windows, and OS X in a variety of flavors (Tarball, DMG, DEB, RPM, etc) and as a Docker image.
You can download the binary or package of your choice from here. I’m going to install a DEB package on an Ubuntu host for my local testing.
|
|
This will install the Vector binary, vector
, to /usr/bin
, some sample configuration to /etc/vector
, and service management setup to run Vector as a service.
Configuring Vector
Now we’ve got it installed, let’s run the vector
binary.
|
|
We’re running Vector interactively and its default configuration behavior is to capture STDIN
. Let’s type something into STDIN
and see what happens.
|
|
We’ve typed This is a test
and seen it echo’ed in the console.
Let’s see how this was achieved in the default configuration shipped with the Vector package. Vector’s configuration is contained in /etc/vector/vector.toml
. The .toml
file extension indicates the project is using the TOML configuration format. The default configuration looks like this:
|
|
vector.spec.toml
file and examples of a variety of configurations in the /etc/vector/examples
directory.The configuration file specifies the location of the Vector data directory, using the data_dir
variable. It also specifies a source and a sink. We name our sources and sinks, prefixing them to identity the type and specifying a name after the period. The type
inside our source and sink tells Vector what that component is, for example stdin
for a source that consumes standard input.
Our sink, sinks.out
, is of the type console
, which outputs events to the console.
|
|
In our configuration, the sources.in
input source is tied to an output via the inputs
variable.
|
|
The variable is an array so you can tie multiple sources to a sink, so we could send events from the sources.in
source and other sources to the sinks.out
sink by specifying another entry in the inputs
array. This allows you to create pipelines in Vector where events flow from sources, through transforms, and out via sinks.
The encoding
variable specifies how the event will be outputted, in our case the output will be in the form of plain text.
Let’s create a new source, monitoring a log file on our host,
|
|
This adds a new sources.auth
source with a type
of file
. It uses the include
variable to specify an array of files to monitor. We’re monitoring the Ubuntu authentication log file: /var/log/auth.log
.
We’ve added our new source to the sinks.out
sink in the inputs
variable. We’ve also changed the sink’s encoding to JSON-formatted events.
Let’s restart Vector, this time running it with sudo
to give us read access to the /var/log/auth.log
file.
|
|
We can see that JSON-formatted events from the SSH daemon are being collected and emitted from Vector.
|
|
The JSON-encoded events contain the source and source type of the event, the host, and the message.
We can make use of the last Vector component, transforms, to parse or edit events. Here’s an example of a transform that uses Logstash’s grok
parsing to process Syslog-style events.
|
|
We can see our new transform, transforms.syslog
, with a type grok_parser
. Like our sources and sink, we tie transforms together as part of a pipeline using the chained inputs
variable. We use the auth
source as the value of the inputs
variable in the transform. This ensures only events from that source are processed. Our parser uses the regular expression in pattern
on the message
field, in our case it parses using the default grok Syslog primitives and then grabbing the rest of the message. We then drop the original message
field. Finally, we update our sink to receive the processed events from the transform.
When we restart Vector, we’ll see the processed events after they are parsed by our transform.
|
|
The transform has broken the original message
into fields, extracting the useful information out of the standard Syslog message.
Conclusions
Vector is definitely interesting. It doesn’t quite have the scope of components/plug-ins that the ELK stack already has but inclusions like the grok_parser
provide a lot of power. I’m not a huge fan of TOML configuration at the best of times but it’s workable. I’m also impressed with the ability to type, and coerce, types onto fields in configuration. The documentation is detailed albeit a bit rough in places and I stumbled on a couple of regressions and inconsistencies. Another of the interesting things Vector brings to the table is a unit test framework. I’m fascinated to try out some unit tests of logging and parsing configuration.
Overall, Vector is definitely worth taking for a spin. Testing a Rust-based processor for performance, especially against the slightly heavier weight JRuby of Logstash, would be an interesting exercise. I was able to throw a moderate load at Vector (10,000 events/sec), which it handled easily, but that’s definitely not the scale of some of my ELK implementations. I’m curious to see how it handles larger and more complex workloads.