Elasticshell

Elasticsearch is a wonderful database for performing full-text on rich documents at terabyte-scale.

It’s already pretty easy to talk to Elasticsearch. You can

  • use the HTTP-based, REST API via commmand-line tools like curl, your favorite HTTP library, or even your browser’s URL bar

  • use the interface built on Apache Thrift

  • use the native Java classes

What’s missing was a command-line shell that let you directly inspect Elasticsearch’s “filesystem” or “database schema”, run queries, and in general muck about. I got sick of writing things like

$ curl -s -X GET "http://localhost:9200/_status" | ruby -rjson -e 'puts JSON.parse($stdin.read)["indices"]["my_index"]["docs"]["num_docs"]'

How about

$ es /_status --only=indices.my_index.docs.num_docs

Installation

Install the gem:

$ sudo gem install elasticshell

which installs a program ‘es’ that you can run from the command line to start Elasticshell. Try

$ es --help

right now to see that everything is properly installed. You’ll also see a brief survey of Elasticshell’s startup options.

Usage

To start an Elasticshell session, just run

$ es

Elasticshell will automatically try to connect to a local Elasticsearch database running on the default port. You can modify this with the startup options. Type help at any time to get some contextual help from Elasticshell.

Within Elasticshell, there are three variables whose values affect behavior. These variables are reflected in the default prompt, for example:

GET /my_index/my_type$

This prompt tells us three things:

  1. The default HTTP verb we’re using for requests is GET.

  2. The default API “scope” we’re in is /my_index/my_type. If the shell is connected to an Elasticsearch server and the scope exists, it will be colored green. Otherwise it’s yellow.

  3. Elasticshell will print raw responses from the database – this is the $ at the end of the prompt. If we were in pretty-print mode, this would become a $$.

Connecting to the Database

Elasticshell will try to connect to the Elasticsearch hosts passed with the --servers option during startup. At any other time, you can connect to servers by issuing the connect command:

GET /$ connect http://192.168.1.10:9200 http://192.168.1.11:9200 http://192.168.1.12:9200 http://192.168.1.13:9200

Understanding Scope

Scopes are defined by the Elasticsearch REST API. Some scopes like /_cluster or /_nodes are static and present for all Elasticsearch clusters. Other scopes like /my_index/my_type depend upon the particular cluster.

Use the cd built-in to move between scopes:

GET /$ cd /blog/comments
GET /blog/comments$ cd ..
GET /blog$ cd /blog/entries
GET /blog/entries$ cd
GET /$

Different kinds of scopes

The ls command will show the contents of a given scope:

GET /$ ls
blog _cluster _nodes _status

but the ll command gives more output:

GET /$ ll
i      1/1/0      5  3.3kb blog
s                          _cluster
s                          _nodes
-                          _status

Here you see that blog

  • is an index (the i in the first column)

  • has 1 total shard, 1 successful shard, and 0 failed shards

  • has 5 documents

  • occupies 3.3kb of space on disk

And, because of the s in the first-column, _cluster and _nodes are scopes – you can cd into them.

Finally, _status is a request, you can’t cd into it, but you can issue it.

If you were to first cd into the index and run ll you’ll see different output:

GET /$ cd /blog/ 
GET /blog$ ll
m                          comments
m                          entries
-                          _aliases
-                          _search
-                          _stats
-                          _status

_aliases and so on are just more requests you can make but comments and entries are mappings (they have an m in the first-column).

Changing HTTP Verb

You can change Elasticsearch’s default HTTP verb by giving it one. Here’s the same thing in two steps:

GET /$ PUT
PUT /$ /my_new_index

You can also do this on a per-request basis

GET /$ PUT /my_new_index
GET /$

Changing Prettiness

Typing pretty at any time will toggle Elasticsearch’s pretty-printing format on or off.

GET /$ pretty
GET /$$

The extra $-sign means it’s pretty…

Making Requests

Scopes are fine for organizing the API but to get anything done you’ll have to send a request.

Named requests

Each scope has different fixed, named requests, as per the Elasticsearch API documentation. Within a scope, tab-complete on the first word to see a list of possible commands. Hit enter after a command to see output from Elasticsearch.

Here’s a command to get the status for the cluster:

GET /$ _status

Here’s a command to get the health of the cluster:

GET /$ cd _cluster
GET /_cluster$ health

which you could also have issued like this

GET /$ _cluster/health

Using query strings

Commands will also accept a query string, as in this example of a search through my_index:

GET /my_index$ _search?q=foo+AND+bar

Using a body

In the above example the query foo AND bar was passed via the query string part of a URL. Passing a more complex query requires we put the query in the body of the request. If you’re willing to forego using spaces you can do this right on the same line:

GET /my_index$ _search {"query":{"query_string":{"query":"foo"}}}

But if you want more expressiveness you can either name a file (with tab-completion) that contains the body you want:

# in /tmp/query.json
{
  "query": {
    "query_string: {
      "query": "foo AND bar"
    }
  }
}

followed by

GET /my_index$ _search /tmp/query.json

Or you can do cat-style, pasting the query into the shell, by using the - character:

GET /my_index$ _search -
{
  "query": {
    "query_string: {
      "query": "foo AND bar"
    }
  }
}

Don’t forget to use Ctrl-D to send an EOF to flush the input of the query.

Redirecting output

You can redirect the output from a request in a variety of ways.

Most simply is to redirect to a file:

GET /my_index$ _search /tmp/query_with_lots_of_output.json > /data/output_file.json

Or you can try sending it to Ruby itself:

GET /my_index$ _search /tmp/query_that_needs_filtering.json | puts response["hits"]["hits"].first["body"]

Everything to the right of the | is executing within a Ruby process that has the response and request variables in scope. You can be even more free by just piping without any Ruby code, which will leave you in a Ruby (Pry) shell with the same binding.

GET /my_index$ _search /tmp/query_that_needs_interaction.json | 
>> response
=> {"took"=>1, "timed_out"=>false, ... }
>> request
=> {:verb=>"GET", :path=>"/_search", :query_options=>{}, :body=>""}

Hit CTRL-D to get out of this new interactive Ruby shell and back to Elasticshell.

Requests from the command line

Instead of running Elasticshell interactively, you can exit after running only a single command by feeding the request path directly to the es script. For example

$ es /_cluster/health
$ es --scope=/_cluster health
$ es --verb=GET /_cluster/health

all work like you think they do.

The --only option can also be passed a .-separated hierarchical list of keys to slice into the resulting object. This is useful when trying to drill into a large amount of data returned by Elasticsearch. The example from the start of this file is relevant again here:

$ es /_status --only=indices.my_index.docs.num_docs