Class: Solr::CursorStream
- Inherits:
-
Object
- Object
- Solr::CursorStream
- Includes:
- Enumerable
- Defined in:
- lib/solr/cursorstream.rb,
lib/solr/cursorstream/version.rb
Overview
Fetch results from a solr filter query via solr’s cursor streaming. solr.apache.org/guide/8_6/pagination-of-results.html#fetching-a-large-number-of-sorted-results-cursors
Note that accessors for things like query, filters, etc. are made available for ease of configuration only. Changing anything in the middle of a job will screw up the cursors and leave things undetermined. Just make another CursorStream object.
Defined Under Namespace
Constant Summary collapse
- VERSION =
"0.2.0"
Instance Attribute Summary collapse
-
#batch_size ⇒ Object
Returns the value of attribute batch_size.
-
#fields ⇒ Object
Returns the value of attribute fields.
-
#filters ⇒ Object
Returns the value of attribute filters.
-
#handler ⇒ Object
Returns the value of attribute handler.
-
#logger ⇒ Object
Returns the value of attribute logger.
-
#query ⇒ Object
Returns the value of attribute query.
-
#sort ⇒ Object
Returns the value of attribute sort.
-
#url ⇒ Object
Returns the value of attribute url.
Class Method Summary collapse
-
.connection(adapter: :httpx) ⇒ Faraday::Connection
Build up a Faraday connection necessary adapter already.
Instance Method Summary collapse
- #connection(adapter: @adapter) ⇒ Object
-
#default_params ⇒ Hash
Default solr params derived from instance variables.
-
#each ⇒ Object
Iterate through the documents in the stream.
-
#get_page ⇒ CursorResponse
Get a single “page” (‘batch_size` documents) from solr.
-
#http_request_retry_block ⇒ Object
Lambda that runs every time the connection needs to retry due to http error.
-
#initialize(url:, handler: "select", query: "*:*", filters: ["*:*"], sort: "id asc", batch_size: 100, fields: [], logger: nil, adapter: :httpx) {|_self| ... } ⇒ CursorStream
constructor
A new instance of CursorStream.
-
#solr_has_more? ⇒ Boolean
Determine if solr has another page of results.
-
#solr_url ⇒ Object
String solr url build from the passed url and the handler.
-
#verify_we_have_everything! ⇒ Object
Make sure we have everything we need for a successful stream.
Constructor Details
#initialize(url:, handler: "select", query: "*:*", filters: ["*:*"], sort: "id asc", batch_size: 100, fields: [], logger: nil, adapter: :httpx) {|_self| ... } ⇒ CursorStream
Returns a new instance of CursorStream.
31 32 33 34 35 36 37 38 39 40 41 42 43 44 |
# File 'lib/solr/cursorstream.rb', line 31 def initialize(url:, handler: "select", query: "*:*", filters: ["*:*"], sort: "id asc", batch_size: 100, fields: [], logger: nil, adapter: :httpx) @url = url.gsub(/\/\Z/, "") @query = query @handler = handler @filters = filters @sort = sort @batch_size = batch_size @fields = fields @logger = logger @adapter = adapter @current_cursor = "*" yield self if block_given? end |
Instance Attribute Details
#batch_size ⇒ Object
Returns the value of attribute batch_size.
20 21 22 |
# File 'lib/solr/cursorstream.rb', line 20 def batch_size @batch_size end |
#fields ⇒ Object
Returns the value of attribute fields.
20 21 22 |
# File 'lib/solr/cursorstream.rb', line 20 def fields @fields end |
#filters ⇒ Object
Returns the value of attribute filters.
20 21 22 |
# File 'lib/solr/cursorstream.rb', line 20 def filters @filters end |
#handler ⇒ Object
Returns the value of attribute handler.
20 21 22 |
# File 'lib/solr/cursorstream.rb', line 20 def handler @handler end |
#logger ⇒ Object
Returns the value of attribute logger.
20 21 22 |
# File 'lib/solr/cursorstream.rb', line 20 def logger @logger end |
#query ⇒ Object
Returns the value of attribute query.
20 21 22 |
# File 'lib/solr/cursorstream.rb', line 20 def query @query end |
#sort ⇒ Object
Returns the value of attribute sort.
20 21 22 |
# File 'lib/solr/cursorstream.rb', line 20 def sort @sort end |
#url ⇒ Object
Returns the value of attribute url.
20 21 22 |
# File 'lib/solr/cursorstream.rb', line 20 def url @url end |
Class Method Details
.connection(adapter: :httpx) ⇒ Faraday::Connection
Build up a Faraday connection necessary adapter already.
67 68 69 70 71 72 73 74 75 76 |
# File 'lib/solr/cursorstream.rb', line 67 def self.connection(adapter: :httpx) require "httpx/adapters/faraday" if adapter == :httpx Faraday.new(request: {params_encoder: Faraday::FlatParamsEncoder}) do |builder| builder.use Faraday::Response::RaiseError builder.request :url_encoded builder.request :retry builder.response :json builder.adapter @adapter end end |
Instance Method Details
#connection(adapter: @adapter) ⇒ Object
79 80 81 82 |
# File 'lib/solr/cursorstream.rb', line 79 def connection(adapter: @adapter) return @connection if @connection @connection = self.class.connection(adapter: @adapter) end |
#default_params ⇒ Hash
Returns Default solr params derived from instance variables.
97 98 99 100 101 102 |
# File 'lib/solr/cursorstream.rb', line 97 def default_params field_list = Array(fields).join(",") p = {q: @query, wt: :json, rows: batch_size, sort: @sort, fq: filters, fl: field_list} p.reject { |_k, v| [nil, "", []].include?(v) } p end |
#each ⇒ Object
Iterate through the documents in the stream. Behind the scenes, these will be fetched in batches of ‘batch_size` for efficiency.
54 55 56 57 58 59 60 61 |
# File 'lib/solr/cursorstream.rb', line 54 def each return enum_for(:each) unless block_given? verify_we_have_everything! while solr_has_more? cursor_response = get_page cursor_response.docs.each { |d| yield d } end end |
#get_page ⇒ CursorResponse
Get a single “page” (‘batch_size` documents) from solr. Feeds into #each
87 88 89 90 91 92 93 94 |
# File 'lib/solr/cursorstream.rb', line 87 def get_page params = {cursorMark: @current_cursor}.merge default_params r = connection.get(solr_url, params) resp = Response.new(r) @last_cursor = @current_cursor @current_cursor = resp.cursor resp end |
#http_request_retry_block ⇒ Object
Returns Lambda that runs every time the connection needs to retry due to http error.
120 121 122 123 124 |
# File 'lib/solr/cursorstream.rb', line 120 def http_request_retry_block ->(env:, options:, retries_remaining:, exception:, will_retry_in:) do # TODO: log that a retry happened end end |
#solr_has_more? ⇒ Boolean
Determine if solr has another page of results
114 115 116 |
# File 'lib/solr/cursorstream.rb', line 114 def solr_has_more? @last_cursor != @current_cursor end |
#solr_url ⇒ Object
Returns String solr url build from the passed url and the handler.
47 48 49 |
# File 'lib/solr/cursorstream.rb', line 47 def solr_url url + "/" + handler end |
#verify_we_have_everything! ⇒ Object
Make sure we have everything we need for a successful stream
106 107 108 109 |
# File 'lib/solr/cursorstream.rb', line 106 def verify_we_have_everything! missing = {handler: @handler, filters: @filters, batch_size: @batch_size}.select { |_k, v| v.nil? }.keys raise Error.new("Solr::CursorStreamer missing value for #{missing.join(", ")}") unless missing.empty? end |