Class: Riddle::Client

Inherits:
Object
  • Object
show all
Defined in:
lib/riddle/client.rb,
lib/riddle/1.10/client.rb,
lib/riddle/0.9.9/client.rb,
lib/riddle/client/filter.rb,
lib/riddle/client/message.rb,
lib/riddle/client/response.rb

Overview

This class was heavily based on the existing Client API by Dmytro Shteflyuk and Alexy Kovyrin. Their code worked fine, I just wanted something a bit more Ruby-ish (ie. lowercase and underscored method names). I also have used a few helper classes, just to neaten things up.

Feel free to use it wherever. Send bug reports, patches, comments and suggestions to pat at freelancing-gods dot com.

Most properties of the client are accessible through attribute accessors, and where relevant use symboles instead of the long constants common in other clients. Some examples:

client.sort_mode  = :extended
client.sort_by    = "birthday DESC"
client.match_mode = :extended

To add a filter, you will need to create a Filter object:

client.filters << Riddle::Client::Filter.new("birthday",
  Time.at(1975, 1, 1).to_i..Time.at(1985, 1, 1).to_i, false)

Defined Under Namespace

Classes: Filter, Message, Response

Constant Summary

Commands =
{
  :search     => 0, # SEARCHD_COMMAND_SEARCH
  :excerpt    => 1, # SEARCHD_COMMAND_EXCERPT
  :update     => 2, # SEARCHD_COMMAND_UPDATE
  :keywords   => 3, # SEARCHD_COMMAND_KEYWORDS
  :persist    => 4, # SEARCHD_COMMAND_PERSIST
  :status     => 5, # SEARCHD_COMMAND_STATUS
  :query      => 6, # SEARCHD_COMMAND_QUERY
  :flushattrs => 7  # SEARCHD_COMMAND_FLUSHATTRS
}
Versions =
{
  :search     => 0x113, # VER_COMMAND_SEARCH
  :excerpt    => 0x100, # VER_COMMAND_EXCERPT
  :update     => 0x101, # VER_COMMAND_UPDATE
  :keywords   => 0x100, # VER_COMMAND_KEYWORDS
  :status     => 0x100, # VER_COMMAND_STATUS
  :query      => 0x100, # VER_COMMAND_QUERY
  :flushattrs => 0x100  # VER_COMMAND_FLUSHATTRS
}
Statuses =
{
  :ok      => 0, # SEARCHD_OK
  :error   => 1, # SEARCHD_ERROR
  :retry   => 2, # SEARCHD_RETRY
  :warning => 3  # SEARCHD_WARNING
}
MatchModes =
{
  :all        => 0, # SPH_MATCH_ALL
  :any        => 1, # SPH_MATCH_ANY
  :phrase     => 2, # SPH_MATCH_PHRASE
  :boolean    => 3, # SPH_MATCH_BOOLEAN
  :extended   => 4, # SPH_MATCH_EXTENDED
  :fullscan   => 5, # SPH_MATCH_FULLSCAN
  :extended2  => 6  # SPH_MATCH_EXTENDED2
}
RankModes =
{
  :proximity_bm25 => 0, # SPH_RANK_PROXIMITY_BM25
  :bm25           => 1, # SPH_RANK_BM25
  :none           => 2, # SPH_RANK_NONE
  :wordcount      => 3, # SPH_RANK_WORDCOUNT
  :proximity      => 4, # SPH_RANK_PROXIMITY
  :match_any      => 5, # SPH_RANK_MATCHANY
  :fieldmask      => 6, # SPH_RANK_FIELDMASK
  :sph04          => 7, # SPH_RANK_SPH04
  :total          => 8  # SPH_RANK_TOTAL
}
SortModes =
{
  :relevance     => 0, # SPH_SORT_RELEVANCE
  :attr_desc     => 1, # SPH_SORT_ATTR_DESC
  :attr_asc      => 2, # SPH_SORT_ATTR_ASC
  :time_segments => 3, # SPH_SORT_TIME_SEGMENTS
  :extended      => 4, # SPH_SORT_EXTENDED
  :expr          => 5  # SPH_SORT_EXPR
}
AttributeTypes =
{
  :integer    => 1, # SPH_ATTR_INTEGER
  :timestamp  => 2, # SPH_ATTR_TIMESTAMP
  :ordinal    => 3, # SPH_ATTR_ORDINAL
  :bool       => 4, # SPH_ATTR_BOOL
  :float      => 5, # SPH_ATTR_FLOAT
  :bigint     => 6, # SPH_ATTR_BIGINT
  :string     => 7, # SPH_ATTR_STRING
  :multi      => 0x40000000 # SPH_ATTR_MULTI
}
GroupFunctions =
{
  :day      => 0, # SPH_GROUPBY_DAY
  :week     => 1, # SPH_GROUPBY_WEEK
  :month    => 2, # SPH_GROUPBY_MONTH
  :year     => 3, # SPH_GROUPBY_YEAR
  :attr     => 4, # SPH_GROUPBY_ATTR
  :attrpair => 5  # SPH_GROUPBY_ATTRPAIR
}
FilterTypes =
{
  :values       => 0, # SPH_FILTER_VALUES
  :range        => 1, # SPH_FILTER_RANGE
  :float_range  => 2  # SPH_FILTER_FLOATRANGE
}
AttributeHandlers =
{
  AttributeTypes[:integer]   => :next_int,
  AttributeTypes[:timestamp] => :next_int,
  AttributeTypes[:ordinal]   => :next_int,
  AttributeTypes[:bool]      => :next_int,
  AttributeTypes[:float]     => :next_float,
  AttributeTypes[:bigint]    => :next_64bit_int,
  AttributeTypes[:string]    => :next,
  AttributeTypes[:multi] + AttributeTypes[:integer] => :next_int_array
}
@@connection =
nil

Instance Attribute Summary (collapse)

Class Method Summary (collapse)

Instance Method Summary (collapse)

Constructor Details

- (Client) initialize(servers = nil, port = nil, key = nil)

Can instantiate with a specific server and port - otherwise it assumes defaults of localhost and 3312 respectively. All other settings can be accessed and changed via the attribute accessors.



141
142
143
144
145
146
147
148
149
150
151
152
# File 'lib/riddle/client.rb', line 141

def initialize(servers = nil, port = nil, key = nil)
  Riddle.version_warning

  @servers = Array(servers || "localhost")
  @port   = port || 9312
  @socket = nil
  @key    = key

  reset

  @queue = []
end

Instance Attribute Details

- (Object) anchor

Returns the value of attribute anchor



118
119
120
# File 'lib/riddle/client.rb', line 118

def anchor
  @anchor
end

- (Object) connection

Returns the value of attribute connection



118
119
120
# File 'lib/riddle/client.rb', line 118

def connection
  @connection
end

- (Object) cut_off

Returns the value of attribute cut_off



118
119
120
# File 'lib/riddle/client.rb', line 118

def cut_off
  @cut_off
end

- (Object) field_weights

Returns the value of attribute field_weights



118
119
120
# File 'lib/riddle/client.rb', line 118

def field_weights
  @field_weights
end

- (Object) filters

Returns the value of attribute filters



118
119
120
# File 'lib/riddle/client.rb', line 118

def filters
  @filters
end

- (Object) group_by

Returns the value of attribute group_by



118
119
120
# File 'lib/riddle/client.rb', line 118

def group_by
  @group_by
end

- (Object) group_clause

Returns the value of attribute group_clause



118
119
120
# File 'lib/riddle/client.rb', line 118

def group_clause
  @group_clause
end

- (Object) group_distinct

Returns the value of attribute group_distinct



118
119
120
# File 'lib/riddle/client.rb', line 118

def group_distinct
  @group_distinct
end

- (Object) group_function

Returns the value of attribute group_function



118
119
120
# File 'lib/riddle/client.rb', line 118

def group_function
  @group_function
end

- (Object) id_range

Returns the value of attribute id_range



118
119
120
# File 'lib/riddle/client.rb', line 118

def id_range
  @id_range
end

- (Object) index_weights

Returns the value of attribute index_weights



118
119
120
# File 'lib/riddle/client.rb', line 118

def index_weights
  @index_weights
end

- (Object) key

Returns the value of attribute key



118
119
120
# File 'lib/riddle/client.rb', line 118

def key
  @key
end

- (Object) limit

Returns the value of attribute limit



118
119
120
# File 'lib/riddle/client.rb', line 118

def limit
  @limit
end

- (Object) match_mode

Returns the value of attribute match_mode



118
119
120
# File 'lib/riddle/client.rb', line 118

def match_mode
  @match_mode
end

- (Object) max_matches

Returns the value of attribute max_matches



118
119
120
# File 'lib/riddle/client.rb', line 118

def max_matches
  @max_matches
end

- (Object) max_query_time

Returns the value of attribute max_query_time



118
119
120
# File 'lib/riddle/client.rb', line 118

def max_query_time
  @max_query_time
end

- (Object) offset

Returns the value of attribute offset



118
119
120
# File 'lib/riddle/client.rb', line 118

def offset
  @offset
end

- (Object) overrides

Returns the value of attribute overrides



118
119
120
# File 'lib/riddle/client.rb', line 118

def overrides
  @overrides
end

- (Object) port

Returns the value of attribute port



118
119
120
# File 'lib/riddle/client.rb', line 118

def port
  @port
end

- (Object) queue (readonly)

Returns the value of attribute queue



124
125
126
# File 'lib/riddle/client.rb', line 124

def queue
  @queue
end

- (Object) rank_expr

Returns the value of attribute rank_expr



118
119
120
# File 'lib/riddle/client.rb', line 118

def rank_expr
  @rank_expr
end

- (Object) rank_mode

Returns the value of attribute rank_mode



118
119
120
# File 'lib/riddle/client.rb', line 118

def rank_mode
  @rank_mode
end

- (Object) retry_count

Returns the value of attribute retry_count



118
119
120
# File 'lib/riddle/client.rb', line 118

def retry_count
  @retry_count
end

- (Object) retry_delay

Returns the value of attribute retry_delay



118
119
120
# File 'lib/riddle/client.rb', line 118

def retry_delay
  @retry_delay
end

- (Object) select

Returns the value of attribute select



118
119
120
# File 'lib/riddle/client.rb', line 118

def select
  @select
end

- (Object) servers

Returns the value of attribute servers



118
119
120
# File 'lib/riddle/client.rb', line 118

def servers
  @servers
end

- (Object) sort_by

Returns the value of attribute sort_by



118
119
120
# File 'lib/riddle/client.rb', line 118

def sort_by
  @sort_by
end

- (Object) sort_mode

Returns the value of attribute sort_mode



118
119
120
# File 'lib/riddle/client.rb', line 118

def sort_mode
  @sort_mode
end

- (Object) timeout

Returns the value of attribute timeout



118
119
120
# File 'lib/riddle/client.rb', line 118

def timeout
  @timeout
end

- (Object) weights

Returns the value of attribute weights



118
119
120
# File 'lib/riddle/client.rb', line 118

def weights
  @weights
end

Class Method Details

+ (Object) connection



134
135
136
# File 'lib/riddle/client.rb', line 134

def self.connection
  @@connection
end

+ (Object) connection=(value)



128
129
130
131
132
# File 'lib/riddle/client.rb', line 128

def self.connection=(value)
  Riddle.mutex.synchronize do
    @@connection = value
  end
end

Instance Method Details

- (Object) add_override(attribute, type, values)



475
476
477
# File 'lib/riddle/client.rb', line 475

def add_override(attribute, type, values)
  @overrides[attribute] = {:type => type, :values => values}
end

- (Object) append_query(search, index = '*', comments = '')

Append a query to the queue. This uses the same parameters as the query method.



219
220
221
# File 'lib/riddle/client.rb', line 219

def append_query(search, index = '*', comments = '')
  @queue << query_message(search, index, comments)
end

- (Object) close



489
490
491
# File 'lib/riddle/client.rb', line 489

def close
  close_socket
end

- (Object) excerpts(options = {})

Build excerpts from search terms (the words) and the text of documents. Excerpts are bodies of text that have the words highlighted. They may also be abbreviated to fit within a word limit.

As part of the options hash, you will need to define:

  • :docs

  • :words

  • :index

Optional settings include:

  • :before_match (defaults to <span class="match">)

  • :after_match (defaults to </span>)

  • :chunk_separator (defaults to ' &#8230; ' - which is an HTML ellipsis)

  • :limit (defaults to 256)

  • :around (defaults to 5)

  • :exact_phrase (defaults to false)

  • :single_passage (defaults to false)

The defaults differ from the official PHP client, as I've opted for semantic HTML markup.

Example:

client.excerpts(:docs => ["Pat Allan, Pat Cash"], :words => 'Pat', :index => 'pats')
#=> ["<span class=\"match\">Pat</span> Allan, <span class=\"match\">Pat</span> Cash"]

lorem_lipsum = "Lorem ipsum dolor..."

client.excerpts(:docs => ["Pat Allan, #{lorem_lipsum} Pat Cash"], :words => 'Pat', :index => 'pats')
#=> ["<span class=\"match\">Pat</span> Allan, Lorem ipsum dolor sit amet, consectetur adipisicing
       elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua &#8230; . Excepteur
       sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est
       laborum. <span class=\"match\">Pat</span> Cash"]

Workflow:

Excerpt creation is completely isolated from searching the index. The nominated index is only used to discover encoding and charset information.

Therefore, the workflow goes:

  1. Do the sphinx query.

  2. Fetch the documents found by sphinx from their repositories.

  3. Pass the documents' text to excerpts for marking up of matched terms.



386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
# File 'lib/riddle/client.rb', line 386

def excerpts(options = {})
  options[:index]                ||= '*'
  options[:before_match]         ||= '<span class="match">'
  options[:after_match]          ||= '</span>'
  options[:chunk_separator]      ||= ' &#8230; ' # ellipsis
  options[:limit]                ||= 256
  options[:limit_passages]       ||= 0
  options[:limit_words]          ||= 0
  options[:around]               ||= 5
  options[:exact_phrase]         ||= false
  options[:single_passage]       ||= false
  options[:query_mode]           ||= false
  options[:force_all_words]      ||= false
  options[:start_passage_id]     ||= 1
  options[:load_files]           ||= false
  options[:html_strip_mode]      ||= 'index'
  options[:allow_empty]          ||= false
  options[:passage_boundary]     ||= 'none'
  options[:emit_zones]           ||= false
  options[:load_files_scattered] ||= false

  response = Response.new request(:excerpt, excerpts_message(options))

  options[:docs].collect { response.next }
end

- (Object) flush_attributes



467
468
469
470
471
472
473
# File 'lib/riddle/client.rb', line 467

def flush_attributes
  response = Response.new request(
    :flushattrs, Message.new
  )

  response.next_int
end

- (Object) keywords(query, index, return_hits = false)

Generates a keyword list for a given query. Each keyword is represented by a hash, with keys :tokenised and :normalised. If return_hits is set to true it will also report on the number of hits and documents for each keyword (see :hits and :docs keys respectively).



434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
# File 'lib/riddle/client.rb', line 434

def keywords(query, index, return_hits = false)
  response = Response.new request(
    :keywords,
    keywords_message(query, index, return_hits)
  )

  (0...response.next_int).collect do
    hash = {}
    hash[:tokenised]  = response.next
    hash[:normalised] = response.next

    if return_hits
      hash[:docs] = response.next_int
      hash[:hits] = response.next_int
    end

    hash
  end
end

- (Object) open



479
480
481
482
483
484
485
486
487
# File 'lib/riddle/client.rb', line 479

def open
  open_socket

  return if Versions[:search] < 0x116

  @socket.send [
    Commands[:persist], 0, 4, 1
  ].pack("nnNN"), 0
end

- (Object) query(search, index = '*', comments = '')

Query the Sphinx daemon - defaulting to all indices, but you can specify a specific one if you wish. The search parameter should be a string following Sphinx's expectations.

The object returned from this method is a hash with the following keys:

  • :matches

  • :fields

  • :attributes

  • :attribute_names

  • :words

  • :total

  • :total_found

  • :time

  • :status

  • :warning (if appropriate)

  • :error (if appropriate)

The key :matches returns an array of hashes - the actual search results. Each hash has the document id (:doc), the result weighting (:weight), and a hash of the attributes for the document (:attributes).

The :fields and :attribute_names keys return list of fields and attributes for the documents. The key :attributes will return a hash of attribute name and type pairs, and :words returns a hash of hashes representing the words from the search, with the number of documents and hits for each, along the lines of:

results[:words]["Pat"] #=> {:docs => 12, :hits => 15}

:total, :total_found and :time return the number of matches available, the total number of matches (which may be greater than the maximum available, depending on the number of matches and your sphinx configuration), and the time in milliseconds that the query took to run.

:status is the error code for the query - and if there was a related warning, it will be under the :warning key. Fatal errors will be described under :error.



336
337
338
339
# File 'lib/riddle/client.rb', line 336

def query(search, index = '*', comments = '')
  @queue << query_message(search, index, comments)
  self.run.first
end

- (Object) reset

Reset attributes and settings to defaults.



155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
# File 'lib/riddle/client.rb', line 155

def reset
  # defaults
  @offset         = 0
  @limit          = 20
  @max_matches    = 1000
  @match_mode     = :all
  @sort_mode      = :relevance
  @sort_by        = ''
  @weights        = []
  @id_range       = 0..0
  @filters        = []
  @group_by       = ''
  @group_function = :day
  @group_clause   = '@group desc'
  @group_distinct = ''
  @cut_off        = 0
  @retry_count    = 0
  @retry_delay    = 0
  @anchor         = {}
  # string keys are index names, integer values are weightings
  @index_weights  = {}
  @rank_mode      = :proximity_bm25
  @rank_expr      = ''
  @max_query_time = 0
  # string keys are field names, integer values are weightings
  @field_weights  = {}
  @timeout        = 0
  @overrides      = {}
  @select         = "*"
end

- (Object) run

Run all the queries currently in the queue. This will return an array of results hashes.



225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
# File 'lib/riddle/client.rb', line 225

def run
  response = Response.new request(:search, @queue)

  results = @queue.collect do
    result = {
      :matches         => [],
      :fields          => [],
      :attributes      => {},
      :attribute_names => [],
      :words           => {}
    }

    result[:status] = response.next_int
    case result[:status]
    when Statuses[:warning]
      result[:warning] = response.next
    when Statuses[:error]
      result[:error] = response.next
      next result
    end

    result[:fields] = response.next_array

    attributes = response.next_int
    attributes.times do
      attribute_name = response.next
      type           = response.next_int

      result[:attributes][attribute_name] = type
      result[:attribute_names] << attribute_name
    end

    result_attribute_names_and_types = result[:attribute_names].
      inject([]) { |array, attr| array.push([ attr, result[:attributes][attr] ]) }

    matches   = response.next_int
    is_64_bit = response.next_int

    result[:matches] = (0...matches).map do |i|
      doc = is_64_bit > 0 ? response.next_64bit_int : response.next_int
      weight = response.next_int

      current_match_attributes = {}

      result_attribute_names_and_types.each do |attr, type|
        current_match_attributes[attr] = attribute_from_type(type, response)
      end

      {:doc => doc, :weight => weight, :index => i, :attributes => current_match_attributes}
    end

    result[:total] = response.next_int.to_i || 0
    result[:total_found] = response.next_int.to_i || 0
    result[:time] = ('%.3f' % (response.next_int / 1000.0)).to_f || 0.0

    words = response.next_int
    words.times do
      word = response.next
      docs = response.next_int
      hits = response.next_int
      result[:words][word] = {:docs => docs, :hits => hits}
    end

    result
  end

  @queue.clear
  results
end

- (Object) server

The searchd server to query. Servers are removed from @server after a Timeout::Error is hit to allow for fail-over.



188
189
190
# File 'lib/riddle/client.rb', line 188

def server
  @servers.first
end

- (Object) server=(server)

Backwards compatible writer to the @servers array.



193
194
195
# File 'lib/riddle/client.rb', line 193

def server=(server)
  @servers = server.to_a
end

- (Object) set_anchor(lat_attr, lat, long_attr, long)

Set the geo-anchor point - with the names of the attributes that contain the latitude and longitude (in radians), and the reference position. Note that for geocoding to work properly, you must also set match_mode to :extended. To sort results by distance, you will need to set sort_by to '@geodist asc', and sort_mode to extended (as an example). Sphinx expects latitude and longitude to be returned from you SQL source in radians.

Example:

client.set_anchor('lat', -0.6591741, 'long', 2.530770)


208
209
210
211
212
213
214
215
# File 'lib/riddle/client.rb', line 208

def set_anchor(lat_attr, lat, long_attr, long)
  @anchor = {
    :latitude_attribute   => lat_attr,
    :latitude             => lat,
    :longitude_attribute  => long_attr,
    :longitude            => long
  }
end

- (Object) status



454
455
456
457
458
459
460
461
462
463
464
465
# File 'lib/riddle/client.rb', line 454

def status
  response = Response.new request(
    :status, Message.new
  )

  rows, cols = response.next_int, response.next_int

  (0...rows).inject({}) do |hash, row|
    hash[response.next.to_sym] = response.next
    hash
  end
end

- (Object) update(index, attributes, values_by_doc)

Update attributes - first parameter is the relevant index, second is an array of attributes to be updated, and the third is a hash, where the keys are the document ids, and the values are arrays with the attribute values - in the same order as the second parameter.

Example:

client.update('people', ['birthday'], {1 => [Time.at(1982, 20, 8).to_i]})


421
422
423
424
425
426
427
428
# File 'lib/riddle/client.rb', line 421

def update(index, attributes, values_by_doc)
  response = Response.new request(
    :update,
    update_message(index, attributes, values_by_doc)
  )

  response.next_int
end