DataMapper Sphinx Adapter

Description

A DataMapper Sphinx adapter.

Dependencies

Ruby
  • dm-core ~> 0.9.7

  • dm-is-searchable ~> 0.9.7 (optional)

I’d recommend using the dm-more plugin dm-is-searchable instead of fetching the document id’s yourself.

Sphinx
  • 0.9.8-r871

  • 0.9.8-r909

  • 0.9.8-r985

  • 0.9.8-r1065

  • 0.9.8-r1112

  • 0.9.8-rc1 (gem version: 0.9.8.1198)

  • 0.9.8-rc2 (gem version: 0.9.8.1231)

  • 0.9.8 (gem version: 0.9.8.1371)

Internally the Riddle client library is used.

Install

  • Via git: git clone git://github.com/shanna/dm-sphinx-adapter.git

  • Via gem: gem install shanna-dm-sphinx-adapter -s gems.github.com

Synopsis

DataMapper uses URIs or a connection has to connect to your data-stores. In this case the sphinx search daemon searchd.

On its own this adapter will only return an array of document hashes when queried. The DataMapper library dm-is-searchable however provides a common interface to search one adapter and load documents from another. My preference is to use this adapter in tandem with dm-is-searchable. See further examples in the synopsis for usage with dm-is-searchable.

Like all DataMapper adapters you can connect with a Hash or URI.

A URI:

DataMapper.setup(:search, 'sphinx://localhost')

The breakdown is:

"#{adapter}://#{host}:#{port}/#{config}"
- adapter Must be :sphinx
- host    Hostname (default: localhost)
- port    Optional port number (default: 3312)

Alternatively supply a Hash:

DataMapper.setup(:search, {
  :adapter  => 'sphinx',       # required
  :config   => './sphinx.conf' # optional. Recommended though.
  :host     => 'localhost',    # optional. Default: localhost
  :port     => 3312            # optional. Default: 3312
}

DataMapper

require 'rubygems'
require 'dm-sphinx-adapter'

DataMapper.setup(:default, 'sqlite3::memory:')
DataMapper.setup(:search, 'sphinx://localhost:3312')

class Item
  include DataMapper::Resource
  property :id, Serial
  property :name, String
end

# Fire up your sphinx search daemon and start searching.
docs  = repository(:search){ Item.all(:name => 'barney') } # Search 'items' index for '@name barney'
ids   = docs.map{|doc| doc[:id]}
items = Item.all(:id => ids) # Search :default for all the document id's returned by sphinx.

DataMapper and IsSearchable

IsSearchable is a DataMapper plugin that provides a common search interface when searching from one adapter and reading documents from another.

IsSearchable will read resources from your :default repository on behalf of a search adapter such as dm-sphinx-adapter and dm-ferret-adapter. This saves some of the grunt work (as shown in the previous example) by mapping the resulting document id’s from a search with your :search adapter into a suitable #first or #all query for your :default repository.

IsSearchable adds a single class method to your resource. The first argument is a Hash of DataMapper::Query conditions to pass to your search adapter (in this case dm-sphinx-adapter). An optional second Hash of DataMapper::Query conditions can also be passed and will be appended to the query on your :default database. This can be handy if you need to add extra exclusions that aren’t possible using dm-sphinx-adapter such as #gt or #lt conditions.

require 'rubygems'
require 'dm-core'
require 'dm-is-searchable'
require 'dm-sphinx-adapter'

# Connections.
DataMapper.setup(:default, 'sqlite3::memory:')
DataMapper.setup(:search, 'sphinx://localhost:3312')

class Item
  include DataMapper::Resource
  property :id, Serial
  property :name, String

  is :searchable # defaults to :search repository though you can be explicit:
  # is :searchable, :repository => :sphinx
end

# Fire up your sphinx search daemon and start searching.
items = Item.search(:name => 'barney') # Search 'items' index for '@name barney'

Merb, DataMapper and IsSearchable

# config/init.rb
dependency 'dm-is-searchable'
dependency 'dm-sphinx-adapter'

# config/database.yml
---
development: &defaults
  repositories:
    search:
      adapter:  sphinx
      host:     localhost
      port:     3312

# app/models/item.rb
class Item
  include DataMapper::Resource
  property :id, Serial
  property :name, String

  is :searchable # defaults to :search repository though you can be explicit:
  # is :searchable, :repository => :sphinx
end # Item

# Fire up your sphinx search daemon and start searching.
Item.search(:name => 'barney') # Search 'items' index for '@name barney'

DataMapper, IsSearchable and DataMapper::SphinxResource

For finer grained control you can include DataMapper::SphinxResource. For instance you can search one or more indexes and sort, include or exclude by attributes defined in your sphinx configuration:

class Item
  include DataMapper::SphinxResource
  property :id, Serial
  property :name, String

  is :searchable
  repository(:search) do
    index :items
    index :items_delta, :delta => true

    # Sphinx attributes to sort include/exclude by.
    attribute :updated_on, DateTime
  end

end # Item

# Search 'items, items_delta' index for '@name barney' updated in the last 30 minutes.
Item.search(:name => 'barney', :updated => (Time.now - 1800 .. Time.now))

Sphinx Configuration

No limitations, restrictions or requirement are imposed on your sphinx configuration. The adapter will not generate nor overwrite your finely crafted config file.

Searchd

To keep things simple, this adapter does not manage your sphinx server. Try one of these fine offerings:

Indexer and Live(ish) updates.

As of 0.3 the indexer will no longer be fired on create/update even if you have delta indexes defined. Sphinx indexing is blazing fast but unless your resource sees very little activity you will run the risk of lock errors on the temporary delta index files (.tmpl.sp1) and your delta index won’t be updated. Given this functionality is unreliable at best I’ve chosen to remove it.

For reliable live(ish) updates in a main + delta scheme it’s probably best you schedule them outside of your ORM. Andrew (Shodan) Aksyonoff of Sphinx suggests a cronjob or alternatively if you need even less lag to “run indexer in an endless loop, with a few seconds of sleep in between to allow searchd some headroom to pick up the changes”.

Contributing

Go nuts. Just send me a pull request (github or otherwise) when you are happy with your code.