Sonic the search engine
For reasons that should be abundantly clear, I’ve been poking at alternatives to Elasticsearch. I’m living in a mostly Rust-based ecosystem right now working on Vector, so I started looking within that world. I found Sonic and decided to give it a whirl.
Sonic is a “fast, lightweight, and schema-less search backend.” It’s written in Rust, licensed under MPL 2.0. It’s maintained by Valerian Saliou, who is one of the founders of Crisp.
Sonic is not Elasticsearch: it’s a lot lighter weight and much less fully-featured. Its focus is on normalizing natural language search queries and providing results. Also, unlike Elasticsearch, Sonic is an identifier index rather than a document index. Queries return IDs, which can then find matching documents in an external database. Search terms are stored in collections and organized in buckets; you can use buckets to segregate your data into separate indexes, for example, a bucket per user or the like.
Another difference worth mentioning is that Sonic indexes at the word level and not at the sentence level. This approach makes for fast and compact storage. It’s worth taking a look at Sonic’s benchmarks to see just how fast. And reviewing Sonic’s limitations to understand the trade-off you’re making to achieve those results.
It’s also important to note that Sonic runs on a single node and lacks fault tolerance capabilities like clustering/replication. Although lightweight, Sonic’s single node nature is likely to hit hardware scaling limits at some point.
Installing and Configuring Sonic
Let’s see Sonic in action. We’re going to run Sonic, add some data to it, and then query that data. The fastest way to do this is to run Sonic from its Docker image. All we need to do this is Docker installed, some quick scaffolding, and a sample configuration file.
Let’s create a directory to hold our Sonic test instance and data and change into that directory.
|
|
Now we’re going to grab the sample configuration file.
|
|
Inside the file, you’ll find a default configuration for Sonic. We’re going to change a few things to make it work for our demo. Firstly, by default, Sonic binds to localhost on port 1491. To work inside a Docker container, we need to bind it to all interfaces. To do this, find this line in the config.cfg
file.
|
|
And change it to:
|
|
Next, we want to tell Sonic where to store its indexes. Let’s create some local directories for that now.
|
|
The kv
directory contains the Key-Value index, and the fst
directory contains a word graph of the data inside Sonic. We’ll be mounting these directories as volumes inside our Docker container, and we need to update our configuration to reference them. Find the following two lines inside config.cfg
and update them:
|
|
And:
|
|
Lastly, let’s up Sonic’s logging to get some more feedback from it. To do this, change the log_level
option to:
|
|
All other defaults can stay the same.
Now let’s run Sonic.
|
|
We’ve mapped port 1491 outside of the container, mounted our configuration file, and store
directories into the container. We should see the Sonic server startup:
|
|
And we can then telnet into port 1491 to see if the server responds.
|
|
And hey presto, we’re up and running. It’s not very exciting without adding some data, so let’s generate some.
Testing Sonic
Sonic comes with a collection of official libraries and community-submitted libraries for languages and frameworks. As it’s Sunday and I am feeling particularly lazy, I will write two quick Ruby scripts: one to send data to Sonic for ingestion and a second to search it. These will both use the Ruby client for Sonic.
Let’s create a new directory to hold our test scripts:
|
|
Now we’ll start our scripts with a Gemfile
:
|
|
And use Bundler to install the sonic-ruby
and the faker
gem we’ll be using to generate some sample data.
|
|
Ingesting data
Now let’s write a quick script to ingest some sample data. We’ll call it ingest.rb
|
|
And populate it like so:
|
|
Here we’re using Faker to generate an array of 10,000 names and pushing them into a collection called users
and into a bucket called all
. We’ll see a flurry of activity from the Sonic server as it indexes all incoming data.
Searching data
We can then write another script to query this data.
|
|
And populate it like so:
|
|
Our script takes a single name as an input and performs two operations. The first is a straight search of the users
collection in the all
bucket. If it matches one or more index IDs, it’ll return them on the command line. The second search returns one or more suggested names. Let’s give it a try now:
|
|
We can see Sonic has returned some matching IDs for kate
and jim
and some suggested variants.
I think this example shows Sonic’s simplicity and power and how easy it is to wire into a search box and gain suggestions and corrections. I can see use cases in the middle-ground between the search needs of folks who would previously have defaulted to using Elasticsearch and what Sonic provides. Naturally, Sonic’s single node nature, the lack of fault tolerance, and the potential scaling challenges may be an issue for many folks. However, I still think it’s a cool project and worth a look.