Class: ThinkingSphinx::Search

Inherits:
Object
  • Object
show all
Extended by:
Facets
Defined in:
lib/thinking_sphinx/search.rb,
lib/thinking_sphinx/search/facets.rb

Overview

Once you’ve got those indexes in and built, this is the stuff that matters - how to search! This class provides a generic search interface - which you can use to search all your indexed models at once. Most times, you will just want a specific model’s results - to search and search_for_ids methods will do the job in exactly the same manner when called from a model.

Defined Under Namespace

Modules: Facets

Constant Summary collapse

GlobalFacetOptions =
{
  :all_attributes => false,
  :class_facet    => true
}

Class Method Summary collapse

Methods included from Facets

facets

Class Method Details

.count(*args) ⇒ Object



405
406
407
408
# File 'lib/thinking_sphinx/search.rb', line 405

def count(*args)
  results, client = search_results(*args.clone)
  results[:total_found] || 0
end

.retry_search_on_stale_index(query, options, &block) ⇒ Object



373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
# File 'lib/thinking_sphinx/search.rb', line 373

def retry_search_on_stale_index(query, options, &block)
  stale_ids = []
  stale_retries_left = case options[:retry_stale]
                        when true
                          3  # default to three retries
                        when nil, false
                          0  # no retries
                        else             options[:retry_stale].to_i
                        end
  begin
    # Passing this in an option so Collection.create_from_results can see it.
    # It should only raise on stale records if there are any retries left.
    options[:raise_on_stale] = stale_retries_left > 0
    block.call
  # If ThinkingSphinx::Collection.create_from_results found records in Sphinx but not
  # in the DB and the :raise_on_stale option is set, this exception is raised. We retry
  # a limited number of times, excluding the stale ids from the search.
  rescue StaleIdsException => e
    stale_retries_left -= 1

    stale_ids |= e.ids  # For logging
    options[:without_ids] = Array(options[:without_ids]) | e.ids  # Actual exclusion

    tries = stale_retries_left
    ::ActiveRecord::Base.logger.debug("Sphinx Stale Ids (%s %s left): %s" % [
        tries, (tries==1 ? 'try' : 'tries'), stale_ids.join(', ')
    ])
    
    retry
  end
end

.search(*args) ⇒ Object

Searches through the Sphinx indexes for relevant matches. There’s various ways to search, sort, group and filter - which are covered below.

Also, if you have WillPaginate installed, the search method can be used just like paginate. The same parameters - :page and :per_page - work as expected, and the returned result set can be used by the will_paginate helper.

Basic Searching

The simplest way of searching is straight text.

ThinkingSphinx::Search.search "pat"
ThinkingSphinx::Search.search "google"
User.search "pat", :page => (params[:page] || 1)
Article.search "relevant news issue of the day"

If you specify :include, like in an #find call, this will be respected when loading the relevant models from the search results.

User.search "pat", :include => :posts

Match Modes

Sphinx supports 5 different matching modes. By default Thinking Sphinx uses :all, which unsurprisingly requires all the supplied search terms to match a result.

Alternative modes include:

User.search "pat allan", :match_mode => :any
User.search "pat allan", :match_mode => :phrase
User.search "pat | allan", :match_mode => :boolean
User.search "@name pat | @username pat", :match_mode => :extended

Any will find results with any of the search terms. Phrase treats the search terms a single phrase instead of individual words. Boolean and extended allow for more complex query syntax, refer to the sphinx documentation for further details.

Weighting

Sphinx has support for weighting, where matches in one field can be considered more important than in another. Weights are integers, with 1 as the default. They can be set per-search like this:

User.search "pat allan", :field_weights => { :alias => 4, :aka => 2 }

If you’re searching multiple models, you can set per-index weights:

ThinkingSphinx::Search.search "pat", :index_weights => { User => 10 }

See sphinxsearch.com/doc.html#weighting for further details.

Searching by Fields

If you want to step it up a level, you can limit your search terms to specific fields:

User.search :conditions => {:name => "pat"}

This uses Sphinx’s extended match mode, unless you specify a different match mode explicitly (but then this way of searching won’t work). Also note that you don’t need to put in a search string.

Searching by Attributes

Also known as filters, you can limit your searches to documents that have specific values for their attributes. There are three ways to do this. The first two techniques work in all scenarios - using the :with or :with_all options.

ThinkingSphinx::Search.search :with => {:tag_ids => 10}
ThinkingSphinx::Search.search :with => {:tag_ids => [10,12]}
ThinkingSphinx::Search.search :with_all => {:tag_ids => [10,12]}

The first :with search will match records with a tag_id attribute of 10. The second :with will match records with a tag_id attribute of 10 OR 12. If you need to find records that are tagged with ids 10 AND 12, you will need to use the :with_all search parameter. This is particuarly useful in conjunction with Multi Value Attributes (MVAs).

The third filtering technique is only viable if you’re searching with a specific model (not multi-model searching). With a single model, Thinking Sphinx can figure out what attributes and fields are available, so you can put it all in the :conditions hash, and it will sort it out.

Node.search :conditions => {:parent_id => 10}

Filters can be single values, arrays of values, or ranges.

Article.search "East Timor", :conditions => {:rating => 3..5}

Excluding by Attributes

Sphinx also supports negative filtering - where the filters are of attribute values to exclude. This is done with the :without option:

User.search :without => {:role_id => 1}

Excluding by Primary Key

There is a shortcut to exclude records by their ActiveRecord primary key:

User.search :without_ids => 1

Pass an array or a single value.

The primary key must be an integer as a negative filter is used. Note that for multi-model search, an id may occur in more than one model.

Infix (Star) Searching

By default, Sphinx uses English stemming, e.g. matching “shoes” if you search for “shoe”. It won’t find “Melbourne” if you search for “elbourn”, though.

Enable infix searching by something like this in config/sphinx.yml:

development:
  enable_star: 1
  min_infix_length: 2

Note that this will make indexing take longer.

With those settings (and after reindexing), wildcard asterisks can be used in queries:

Location.search "*elbourn*"

To automatically add asterisks around every token (but not operators), pass the :star option:

Location.search "elbourn -ustrali", :star => true, :match_mode => :boolean

This would become “elbourn -ustrali”. The :star option only adds the asterisks. You need to make the config/sphinx.yml changes yourself.

By default, the tokens are assumed to match the regular expression /w+/u. If you’ve modified the charset_table, pass another regular expression, e.g.

User.search("[email protected]", :star => /[\w@.]+/u)

to search for “*[email protected]*” and not “oo@bar.c”.

Sorting

Sphinx can only sort by attributes, so generally you will need to avoid using field names in your :order option. However, if you’re searching on a single model, and have specified some fields as sortable, you can use those field names and Thinking Sphinx will interpret accordingly. Remember: this will only happen for single-model searches, and only through the :order option.

Location.search "Melbourne", :order => :state
User.search :conditions => {:role_id => 2}, :order => "name ASC"

Keep in mind that if you use a string, you must specify the direction (ASC or DESC) else Sphinx won’t return any results. If you use a symbol then Thinking Sphinx assumes ASC, but if you wish to state otherwise, use the :sort_mode option:

Location.search "Melbourne", :order => :state, :sort_mode => :desc

Of course, there are other sort modes - check out the Sphinx documentation for that level of detail though.

If desired, you can sort by a column in your model instead of a sphinx field or attribute. This sort only applies to the current page, so is most useful when performing a search with a single page of results.

User.search("pat", :sql_order => "name")

Grouping

For this you can use the group_by, group_clause and group_function options - which are all directly linked to Sphinx’s expectations. No magic from Thinking Sphinx. It can get a little tricky, so make sure you read all the relevant documentation first.

Grouping is done via three parameters within the options hash

  • :group_function determines the way grouping is done

  • :group_by determines the field which is used for grouping

  • :group_clause determines the sorting order

group_function

Valid values for :group_function are

  • :day, :week, :month, :year - Grouping is done by the respective timeframes.

  • :attr, :attrpair - Grouping is done by the specified attributes(s)

group_by

This parameter denotes the field by which grouping is done. Note that the specified field must be a sphinx attribute or index.

group_clause

This determines the sorting order of the groups. In a grouping search, the matches within a group will sorted by the :sort_mode and :order parameters. The group matches themselves however, will be sorted by :group_clause.

The syntax for this is the same as an order parameter in extended sort mode. Namely, you can specify an SQL-like sort expression with up to 5 attributes (including internal attributes), eg: “@relevance DESC, price ASC, @id DESC”

Grouping by timestamp

Timestamp grouping groups off items by the day, week, month or year of the attribute given. In order to do this you need to define a timestamp attribute, which pretty much looks like the standard defintion for any attribute.

define_index do
  #
  # All your other stuff
  #
  has :created_at
end

When you need to fire off your search, it’ll go something to the tune of

Fruit.search "apricot", :group_function => :day, :group_by => 'created_at'

The @groupby special attribute will contain the date for that group. Depending on the :group_function parameter, the date format will be

  • :day - YYYYMMDD

  • :week - YYYYNNN (NNN is the first day of the week in question, counting from the start of the year )

  • :month - YYYYMM

  • :year - YYYY

Grouping by attribute

The syntax is the same as grouping by timestamp, except for the fact that the :group_function parameter is changed

Fruit.search "apricot", :group_function => :attr, :group_by => 'size'

Geo/Location Searching

Sphinx - and therefore Thinking Sphinx - has the facility to search around a geographical point, using a given latitude and longitude. To take advantage of this, you will need to have both of those values in attributes. To search with that point, you can then use one of the following syntax examples:

Address.search "Melbourne", :geo => [1.4, -2.217], :order => "@geodist asc"
Address.search "Australia", :geo => [-0.55, 3.108], :order => "@geodist asc"
  :latitude_attr => "latit", :longitude_attr => "longit"

The first example applies when your latitude and longitude attributes are named any of lat, latitude, lon, long or longitude. If that’s not the case, you will need to explicitly state them in your search, or you can do so in your model:

define_index do
  has :latit  # Float column, stored in radians
  has :longit # Float column, stored in radians

  set_property :latitude_attr   => "latit"
  set_property :longitude_attr  => "longit"
end

Now, geo-location searching really only has an affect if you have a filter, sort or grouping clause related to it - otherwise it’s just a normal search, and _will not_ return a distance value otherwise. To make use of the positioning difference, use the special attribute “@geodist” in any of your filters or sorting or grouping clauses.

And don’t forget - both the latitude and longitude you use in your search, and the values in your indexes, need to be stored as a float in radians, not degrees. Keep in mind that if you do this conversion in SQL you will need to explicitly declare a column type of :float.

define_index do
  has 'RADIANS(lat)', :as => :lat,  :type => :float
  # ...
end

Once you’ve got your results set, you can access the distances as follows:

end

The distance value is returned as a float, representing the distance in metres.

Handling a Stale Index

Especially if you don’t use delta indexing, you risk having records in the Sphinx index that are no longer in the database. By default, those will simply come back as nils:

>> pat_user.delete
>> User.search("pat")
Sphinx Result: [1,2]
=> [nil, <#User id: 2>]

(If you search across multiple models, you’ll get ActiveRecord::RecordNotFound.)

You can simply Array#compact these results or handle the nils in some other way, but Sphinx will still report two results, and the missing records may upset your layout.

If you pass :retry_stale => true to a single-model search, missing records will cause Thinking Sphinx to retry the query but excluding those records. Since search is paginated, the new search could potentially include missing records as well, so by default Thinking Sphinx will retry three times. Pass :retry_stale => 5 to retry five times, and so on. If there are still missing ids on the last retry, they are shown as nils.



355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
# File 'lib/thinking_sphinx/search.rb', line 355

def search(*args)
  query = args.clone  # an array
  options = query.extract_options!
  
  retry_search_on_stale_index(query, options) do
    results, client = search_results(*(query + [options]))
  
    ::ActiveRecord::Base.logger.error(
      "Sphinx Error: #{results[:error]}"
    ) if results[:error]
  
    klass   = options[:class]
    page    = options[:page] ? options[:page].to_i : 1
  
    ThinkingSphinx::Collection.create_from_results(results, page, client.limit, options)
  end
end

.search_for_id(*args) ⇒ Object

Checks if a document with the given id exists within a specific index. Expected parameters:

  • ID of the document

  • Index to check within

  • Options hash (defaults to {})

Example:

ThinkingSphinx::Search.search_for_id(10, "user_core", :class => User)


421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
# File 'lib/thinking_sphinx/search.rb', line 421

def search_for_id(*args)
  options = args.extract_options!
  client  = client_from_options options
  
  query, filters    = search_conditions(
    options[:class], options[:conditions] || {}
  )
  client.filters   += filters
  client.match_mode = :extended unless query.empty?
  client.id_range   = args.first..args.first
  
  begin
    return client.query(query, args[1])[:matches].length > 0
  rescue Errno::ECONNREFUSED => err
    raise ThinkingSphinx::ConnectionError, "Connection to Sphinx Daemon (searchd) failed."
  end
end

.search_for_ids(*args) ⇒ Object

Searches for results that match the parameters provided. Will only return the ids for the matching objects. See #search for syntax examples.

Note that this only searches the Sphinx index, with no ActiveRecord queries. Thus, if your index is not in sync with the database, this method may return ids that no longer exist there.



28
29
30
31
32
33
34
35
# File 'lib/thinking_sphinx/search.rb', line 28

def search_for_ids(*args)
  results, client = search_results(*args.clone)
  
  options = args.extract_options!
  page    = options[:page] ? options[:page].to_i : 1
  
  ThinkingSphinx::Collection.ids_from_results(results, page, client.limit, options)
end