Let’s dive a little deeper into Elasticsearch with Christine Oen!
What is Elasticsearch?
Elasticsearch is an open sourced search engine built on top of Apache Lucene, which is a software library for storing and retrieving data. Because of the way it stores data, it is much faster to retrieve information than if you were retrieving the same data out of a database.
Why would someone need to use Elasticsearch?
There are many use cases for Elasticsearch, but one of the most basic uses is for providing fast search capabilities for end users. For example, if you have an eCommerce website and you want your customers to be able to search for items, using Elasticsearch would enable you to provide almost instantaneous results even if you have a very large inventory. If you tried to search your database for the items, it would likely take a long time which results in a poor user experience.
Elasticsearch also allows you to easily segment your data and create queries that can quickly find what you are looking for which is useful for faceting. For example, on the same eCommerce site a user might be able to click on ‘shoes’, ‘sandals’, ‘size 8’, and have the relevant results instantly show up.
Why would someone need a hosted search provider like Bonsai?
It is possible to set up Elasticsearch and use it on your own, but there’s a lot that goes into it including provisioning your own servers to store data. Bonsai also provides monitoring and support, which means you wouldn’t have to hire a search engineer.
Due to our particular implementation of Elasticsearch, which takes advantage of multi-tenancy and allows us to offer shared plans, we are able to provide much more power per dollar than you would be able to get by yourself.
How does it work?
Elasticsearch makes use of what’s called an inverted index to search quickly, which is similar to an index you would find in a book. A book index has a list of terms you might be looking for, with the associated page numbers you would find those words on. If you had to search through the whole book yourself without that index it would take a long time to find the words, which is similar to how a regular database search works.
In an Inverted Index, each word is stored with the associated documents they are found in, making it much faster to grab the information you need. There are two sides to making this work: Indexing the documents, and the search query. To index a document (to get it into the index), it gets converted into individual terms (tokens) to be used in the inverted index. The data is usually normalized, for example, lower cased, removing stop words like ’the’ and ‘or’, and removing punctuation.
On the other end, the search query would need to be normalized in the same way. If you type “Hello, world” into the search field, it would first be turned into “hello” and “world” in order for Elasticsearch to find the terms in the inverted index and return the documents it finds them in.
What are some popular things that Elastisearch allows you to do?
Elasticsearch allows for defining custom behaviors. Many people use this to build autocomplete features using NGrams. This works by breaking up the words to be populated into the index into smaller tokens. For example, “hello” would be broken up into h, he, hel, hell, hello. If you type an “h” into the search bar, it will associate it with the full word “hello”.
Another thing you can do is make use of fuzzy queries which is useful when someone makes a typo in their search. This works by calculating the number of one character differences that need to happen to a string to make it another string. The smaller the distance, the higher the relevancy of the result.
What are some other benefits of Elasticsearch beyond search capabilities?
Once of the coolest features of Elasticsearch is that it can function as an analytics engine. It is designed to work closely with a suite of tools by Elastic called the ELK stack, which is made up of Elasticsearch, Logstash, and Kibana and allows people to analyze large systems. Anything you want to monitor can be sent to and stored in Elasticsearch via Logstash, and the visual representation of this data can be easily viewed with Kibana.
Data can be set up to be sent automatically into Elasticsearch, so it is effectively monitoring your systems so you can easily notice when something needs to be looked into. For example, at Bonsai we send data from every request that is made to all of our customers’ Elasticsearch instances to a separate analytics index. We are alerted if there are any issues with any of the requests which might mean that a server is down and has to be reprovisioned. Without this, we wouldn’t know until a customer alerted us of the issue.
One of my coworkers wrote a great blog post about another use case for the ELK stack. Also, Jet Propulsion Labs at NASA uses the ELK stack to monitor data from the Mars rover!
Wow – Thanks Christine! Christine is a developer at Bonsai in Austin, Texas where they provide hosting services & support for Elasticsearch.