RSolr
A simple, extensible Ruby client for Apache Solr.
Documentation
The code docs www.rubydoc.info/gems/rsolr
Installation:
gem install rsolr
Example:
require 'rsolr'
# Direct connection
solr = RSolr.connect :url => 'http://solrserver.com'
# Connecting over a proxy server
solr = RSolr.connect :url => 'http://solrserver.com', :proxy=>'http://user:[email protected]:8080'
# Using an alternate Faraday adapter
solr = RSolr.connect :url => 'http://solrserver.com', :adapter => :em_http
# Using a custom Faraday connection
conn = Faraday.new do |faraday|
faraday.response :logger # log requests to STDOUT
faraday.adapter Faraday.default_adapter # make requests with Net::HTTP
end
solr = RSolr.connect conn, :url => 'http://solrserver.com'
# send a request to /select
response = solr.get 'select', :params => {:q => '*:*'}
# send a request to /catalog
response = solr.get 'catalog', :params => {:q => '*:*'}
When the Solr :wt
is :ruby
, then the response will be a Hash. This Hash is the same object returned by Solr, but evaluated as Ruby. If the :wt
is not :ruby
, then the response will be a String.
The response also exposes 2 attribute readers (for any :wt
value), :request
and :response
. Both are Hash objects with symbolized keys.
The :request
attribute contains the original request context. You can use this for debugging or logging. Some of the keys this object contains are :uri
, :query
, :method
etc..
The :response
attribute contains the original response. This object contains the :status
, :body
and :headers
keys.
Request formats
By default, RSolr uses the Solr JSON command format for all requests.
RSolr.connect :url => 'http://solrserver.com', update_format: :json # the default
# or
RSolr.connect :url => 'http://solrserver.com', update_format: :xml
Timeouts
The read and connect timeout settings can be set when creating a new instance of RSolr, and will be passed on to underlying Faraday instance:
solr = RSolr.connect(:timeout => 120, :open_timeout => 120)
Retry 503s
A 503 is usually a temporary error which RSolr may retry if requested. You may specify the number of retry attempts with the :retry_503
option.
Only requests which specify a Retry-After header will be retried, after waiting the indicated retry interval, otherwise RSolr will treat the request as a 500. You may specify a maximum Retry-After interval to wait with the :retry_after_limit
option (default: one second).
solr = RSolr.connect(:retry_503 => 1, :retry_after_limit => 1)
For additional control, consider using a custom Faraday connection (see above) using its ‘retry` middleware.
Querying
Use the #get / #post method to send search requests to the /select handler:
response = solr.get 'select', :params => {
:q=>'washington',
:start=>0,
:rows=>10
}
response["response"]["docs"].each{|doc| puts doc["id"] }
The :params
sent into the method are sent to Solr as-is, which is to say they are converted to Solr url style, but no special mapping is used. When an array is used, multiple parameters *with the same name* are generated for the Solr query. Example:
solr.get 'select', :params => {:q=>'roses', :fq=>['red', 'violet']}
The above statement generates this Solr query:
select?q=roses&fq=red&fq=violet
Pagination
To paginate through a set of Solr documents, use the paginate method:
solr.paginate 1, 10, "select", :params => {:q => "test"}
The first argument is the current page, the second is how many documents to return for each page. In other words, “page” is the “start” Solr param and “per-page” is the “rows” Solr param.
The paginate method returns WillPaginate ready “docs” objects, so for example in a Rails application, paginating is as simple as:
<%= will_paginate @solr_response["response"]["docs"] %>
Method Missing
The RSolr::Client
class also uses method_missing
for setting the request handler/path:
solr.paintings :params => {:q=>'roses', :fq=>['red', 'violet']}
This is sent to Solr as:
paintings?q=roses&fq=red&fq=violet
This works with pagination as well:
solr.paginate_paintings 1, 10, {:q=>'roses', :fq=>['red', 'violet']}
Using POST for Search Queries
There may be cases where the query string is too long for a GET request. RSolr solves this issue by converting hash objects into form-encoded strings:
response = solr.music :data => {:q => "*:*"}
The :data
hash is serialized as a form-encoded query string, and the correct content-type headers are sent along to Solr.
Sending HEAD Requests
There may be cases where you’d like to send a HEAD request to Solr:
solr.head("admin/ping").response[:status] == 200
Sending HTTP Headers
Solr responds to the request headers listed here: wiki.apache.org/solr/SolrAndHTTPCaches To send header information to Solr using RSolr, just use the :headers
option:
response = solr.head "admin/ping", :headers => {"Cache-Control" => "If-None-Match"}
Building a Request
RSolr::Client
provides a method for building a request context, which can be useful for debugging or logging etc.:
request_context = solr.build_request "select", :data => {:q => "*:*"}, :method => :post, :headers => {}
To build a paginated request use build_paginated_request:
request_context = solr.build_paginated_request 1, 10, "select", ...
Updating Solr
Updating is done using native Ruby objects. Hashes are used for single documents and arrays are used for a collection of documents (hashes). These objects get turned into simple XML “messages”. Raw XML strings can also be used.
Single document via #add
solr.add :id=>1, :price=>1.00
Multiple documents via #add
documents = [{:id=>1, :price=>1.00}, {:id=>2, :price=>10.50}]
solr.add documents
The optional :add_attributes
hash can also be used to set Solr “add” document attributes:
solr.add documents, :add_attributes => {:commitWithin => 10}
Raw commands via #update
solr.update data: '<commit/>', headers: { 'Content-Type' => 'text/xml' }
solr.update data: { optimize: true }.to_json, headers: { 'Content-Type' => 'application/json' }
When adding, you can also supply “add” xml element attributes and/or a block for manipulating other “add” related elements (docs and fields) by calling the xml
method directly:
doc = {:id=>1, :price=>1.00}
add_attributes = {:allowDups=>false, :commitWithin=>10}
add_xml = solr.xml.add(doc, add_attributes) do |doc|
# boost each document
doc.attrs[:boost] = 1.5
# boost the price field:
doc.field_by_name(:price).attrs[:boost] = 2.0
end
Now the “add_xml” object can be sent to Solr like:
solr.update :data => add_xml
Deleting
Delete by id
solr.delete_by_id 1
or an array of ids
solr.delete_by_id [1, 2, 3, 4]
Delete by query:
solr.delete_by_query 'price:1.00'
Delete by array of queries
solr.delete_by_query ['price:1.00', 'price:10.00']
Commit / Optimize
solr.commit, :commit_attributes => {}
solr.optimize, :optimize_attributes => {}
Response Formats
The default response format is Ruby. When the :wt
param is set to :ruby
, the response is eval’d resulting in a Hash. You can get a raw response by setting the :wt
to “ruby” - notice, the string – not a symbol. RSolr will eval the Ruby string ONLY if the :wt value is :ruby. All other response formats are available as expected, :wt=>‘xml’ etc..
Evaluated Ruby:
solr.get 'select', :params => {:wt => :ruby} # notice :ruby is a Symbol
Raw Ruby:
solr.get 'select', :params => {:wt => 'ruby'} # notice 'ruby' is a String
XML:
solr.get 'select', :params => {:wt => :xml}
JSON (default):
solr.get 'select', :params => {:wt => :json}
Related Resources & Projects
-
RSolr Google Group – The RSolr discussion group
-
rsolr-ext – An extension kit for RSolr
-
rsolr-direct – JRuby direct connection for RSolr
-
rsolr-nokogiri – Gives RSolr Nokogiri for XML generation.
-
SunSpot – An awesome Solr DSL, built with RSolr
-
Blacklight – A “next generation” Library OPAC, built with RSolr
-
java_bin – Provides javabin/binary parsing for RSolr
-
Solr – The Apache Solr project
-
solr-ruby – The original Solr Ruby Gem!
Note on Patches/Pull Requests
-
Fork the project.
-
Make your feature addition or bug fix.
-
Add tests for it. This is important so I don’t break it in a future version unintentionally.
-
Commit, do not mess with rakefile, version, or history (if you want to have your own version, that is fine but bump version in a commit by itself I can ignore when I pull)
-
Send me a pull request. Bonus points for topic branches.
Contributors
-
Nathan Witmer
-
Magnus Bergmark
-
shima
-
Randy Souza
-
Mat Brown
-
Jeremy Hinegardner
-
Denis Goeury
-
shairon toledo
-
Rob Di Marco
-
Peter Kieltyka
-
Mike Perham
-
Lucas Souza
-
Dmitry Lihachev
-
Antoine Latter
-
Naomi Dushay
Author
Matt Mitchell <[email protected]>
Copyright
Copyright © 2008-2010 Matt Mitchell. See LICENSE for details.