Slingshot
Slingshot aims to provide rich and comfortable Ruby API and DSL for the ElasticSearch search engine/database.
ElasticSearch is a scalable, distributed, highly-available, RESTful database communicating by JSON over HTTP, based on Lucene, written in Java. It manages to be very simple and very powerful at the same time. You should seriously consider it to power search in your Ruby applications: it will deliver all the features you want — and many more you may have not imagined yet (native geo search? date histogram facets? percolator?)
Slingshot currently allows basic operation with the index and searching. More is planned.
Installation
First, you need a running ElasticSearch server. Thankfully, it's easy. Let's define easy:
$ curl -k -L -o elasticsearch-0.15.0.tar.gz http://github.com/downloads/elasticsearch/elasticsearch/elasticsearch-0.15.0.tar.gz
$ tar -zxvf elasticsearch-0.15.0.tar.gz
$ ./elasticsearch-0.15.0/bin/elasticsearch -f
OK, easy. Now, install the gem via Rubygems:
$ gem install slingshot-rb
or from source:
$ git clone git://github.com/karmi/slingshot.git
$ cd slingshot
$ rake install
Usage
Currently, you can use Slingshot via the DSL (eg. by extending your class with it).
Plans for full ActiveModel integration (and other convenience layers) are in progress
(see the activemodel
branch).
To kick the tires, require the gem in an IRB session or a Ruby script
(note that you can just run the full example from examples/dsl.rb
):
require 'rubygems'
require 'slingshot'
First, let's create an index named articles
and store/index some documents:
Slingshot.index 'articles' do
delete
create
store :title => 'One', :tags => ['ruby']
store :title => 'Two', :tags => ['ruby', 'python']
store :title => 'Three', :tags => ['java']
store :title => 'Four', :tags => ['ruby', 'php']
refresh
end
You can also create the index with specific mappings, such as:
Slingshot.index 'articles' do
create :mappings => {
:article => {
:properties => {
:id => { :type => 'string', :index => 'not_analyzed', :include_in_all => false },
:title => { :type => 'string', :boost => 2.0, :analyzer => 'snowball' },
:tags => { :type => 'string', :analyzer => 'keyword' },
:content => { :type => 'string', :analyzer => 'snowball' }
}
}
}
end
Now, let's query the database.
We are searching for articles whose title
begins with letter “T”, sorted by title
in descending
order,
filtering them for ones tagged “ruby”, and also retrieving some facets
from the database:
s = Slingshot.search 'articles' do
query do
string 'title:T*'
end
filter :terms, :tags => ['ruby']
sort { title 'desc' }
facet 'global-tags' do
terms :tags, :global => true
end
facet 'current-tags' do
terms :tags
end
end
Let's display the results:
s.results.each do |document|
puts "* #{ document.title } [tags: #{document..join(', ')}]"
end
# * Two [tags: ruby, python]
Let's display the global facets (distribution of tags across the whole database):
s.results.facets['global-tags']['terms'].each do |f|
puts "#{f['term'].ljust(10)} #{f['count']}"
end
# ruby 3
# python 1
# php 1
# java 1
Now, let's display the facets based on current query (notice that count for articles tagged with 'java' is included, even though it's not returned by our query; count for articles tagged 'php' is excluded, since they don't match the current query):
s.results.facets['current-tags']['terms'].each do |f|
puts "#{f['term'].ljust(10)} #{f['count']}"
end
# ruby 1
# python 1
# java 1
We can display the full query JSON:
puts s.to_json
# {"facets":{"current-tags":{"terms":{"field":"tags"}},"global-tags":{"global":true,"terms":{"field":"tags"}}},"query":{"query_string":{"query":"title:T*"}},"filter":{"terms":{"tags":["ruby"]}},"sort":[{"title":"desc"}]}
Or, we can display the corresponding curl
command for easy debugging:
puts s.to_curl
# curl -X POST "http://localhost:9200/articles/_search?pretty=true" -d '{"facets":{"current-tags":{"terms":{"field":"tags"}},"global-tags":{"global":true,"terms":{"field":"tags"}}},"query":{"query_string":{"query":"title:T*"}},"filter":{"terms":{"tags":["ruby"]}},"sort":[{"title":"desc"}]}'
Features
Currently, Slingshot supports only a limited subset of vast ElasticSearch Search API and it's Query DSL:
- Creating, deleting and refreshing the index
- Creating the index with specific mapping
- Storing a document in the index
- Querying the index with the
query_string
,term
andterms
types of queries - Sorting the results by
fields
- Filtering the results
- Retrieving a terms type of facets -- other types are high priority
- Returning just specific
fields
from documents - Paging with
from
andsize
query options
See the examples/dsl.rb
.
Slingshot wraps the results in a enumerable Results::Collection
class, and every result in a Results::Item
class,
which looks like a child of Hash
and Openstruct
.
You may wrap the result items in your own class by setting the Configuration.wrapper
property.
Check out file test/unit/results_collection_test.rb
to see how to do that.
Todo & Plans
In order of importance:
- Seamless ActiveModel compatibility for easy usage in Rails applications (this also means nearly full ActiveRecord compatibility). See the
activemodel
branch - Seamless will_paginate compatibility for easy pagination
- Mapping definition for models
- Proper RDoc annotations for the source code
- Dual interface: allow to simply pass queries/options for ElasticSearch as a Hash in any method
- Histogram facets
- Seamless support for auto-updating river index for CouchDB
_changes
feed - Statistical facets
- Geo Distance facets
- Index aliases management
- Analyze API support
- Highligting support
- Bulk API
- Embedded webserver to display statistics and to allow easy searches
Other Clients
Check out other ElasticSearch clients.
Feedback
You can send feedback via e-mail or via Github Issues.