Module: Kithe::SolrUtil
- Defined in:
- app/indexing/kithe/solr_util.rb
Overview
This is all somewhat hacky code, but it gets the job done. Some convenienceutilities for dealing with your Solr index, including issuing a query to delete_all; and finding and deleting “orphaned” Kithe::Indexable Solr objects that no longer exist in the rdbms.
Unlike other parts of Kithe’s indexing support, this stuff IS very solr-specific, and generally implemented with [rsolr](github.com/rsolr/rsolr).
Class Method Summary collapse
-
.delete_all(solr_url: Kithe.indexable_settings.solr_url, commit: :hard) ⇒ Object
Just a utility method to delete everything from Solr, and then issue a commit, using Rsolr.
-
.delete_solr_orphans(batch_size: 100, solr_url: Kithe.indexable_settings.solr_url) ⇒ Object
Finds any Solr objects that have a ‘model_name_ssi` field (or `Kithe.indexable_settings.model_name_solr_field` if non-default), but don’t exist in the rdbms, and deletes them from Solr, then issues a commit.
-
.solr_orphan_ids(batch_size: 100, solr_url: Kithe.indexable_settings.solr_url) ⇒ Object
based on sunspot, does not depend on Blacklight.
Class Method Details
.delete_all(solr_url: Kithe.indexable_settings.solr_url, commit: :hard) ⇒ Object
Just a utility method to delete everything from Solr, and then issue a commit, using Rsolr. Pretty trivial.
Intended for dev/test instances, not really production.
89 90 91 92 93 94 95 96 97 98 99 100 101 102 |
# File 'app/indexing/kithe/solr_util.rb', line 89 def self.delete_all(solr_url: Kithe.indexable_settings.solr_url, commit: :hard) rsolr = RSolr.connect :url => solr_url # RSolr is a bit under-doc'd, but this SEEMS to work to send a commit # or softCommit instruction with the delete request. params = {} if commit == :hard params[:commit] = true elsif commit == :soft params[:softCommit] = true end rsolr.delete_by_query("*:*", params: params) end |
.delete_solr_orphans(batch_size: 100, solr_url: Kithe.indexable_settings.solr_url) ⇒ Object
Finds any Solr objects that have a ‘model_name_ssi` field (or `Kithe.indexable_settings.model_name_solr_field` if non-default), but don’t exist in the rdbms, and deletes them from Solr, then issues a commit.
Under normal use, you shouldn’t have to do this, but can if your Solr index has gotten out of sync and you don’t want to delete it and reindex from scratch.
Implemented in terms of .solr_orphan_ids.
A bit hacky implementation, it might be nice to have a progress bar, we don’t now.
Does return an array of any IDs deleted.
70 71 72 73 74 75 76 77 78 79 80 81 82 |
# File 'app/indexing/kithe/solr_util.rb', line 70 def self.delete_solr_orphans(batch_size: 100, solr_url: Kithe.indexable_settings.solr_url) rsolr = RSolr.connect :url => solr_url deleted_ids = [] solr_orphan_ids(batch_size: batch_size, solr_url: solr_url) do |orphan_id| deleted_ids << orphan_id rsolr.delete_by_id(orphan_id) end rsolr.commit return deleted_ids end |
.solr_orphan_ids(batch_size: 100, solr_url: Kithe.indexable_settings.solr_url) ⇒ Object
based on sunspot, does not depend on Blacklight. github.com/sunspot/sunspot/blob/3328212da79178319e98699d408f14513855d3c0/sunspot_rails/lib/sunspot/rails/searchable.rb#L332
solr_index_orphans do |orphaned_id|
delete(id)
end
It is searching for any Solr object with a ‘Kithe.indexable_settings.model_name_solr_field` field (default `model_name_ssi`). Then, it takes the ID and makes sure it exists in the database using Kithe::Model. At the moment we are assuming everything is in Kithe::Model, rather than trying to use the `model_name_ssi` to fetch from different tables. Could maybe be enhanced to not.
This is intended mostly for use by .delete_solr_orphans
A bit hacky implementation, it might be nice to support a progress bar, we don’t now.
29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 |
# File 'app/indexing/kithe/solr_util.rb', line 29 def self.solr_orphan_ids(batch_size: 100, solr_url: Kithe.indexable_settings.solr_url) return enum_for(:solr_index_orphan_ids) unless block_given? model_name_solr_field = Kithe.indexable_settings.model_name_solr_field model_solr_id_attr = Kithe.indexable_settings.solr_id_value_attribute solr_page = -1 rsolr = RSolr.connect :url => solr_url while (solr_page = solr_page.next) response = rsolr.get 'select', params: { rows: batch_size, start: (batch_size * solr_page), fl: "id", q: "#{model_name_solr_field}:[* TO *]" } solr_ids = response["response"]["docs"].collect { |h| h["id"] } break if solr_ids.empty? (solr_ids - Kithe::Model.where(model_solr_id_attr => solr_ids).pluck(model_solr_id_attr)).each do |orphaned_id| yield orphaned_id end end end |