Module: GeoblacklightAdmin::SolrUtils
- Defined in:
- app/models/geoblacklight_admin/solr_utils.rb
Overview
This is all somewhat hacky code, but it gets the job done. Some convenience utilities for dealing with your Solr index, including issuing a query to delete_all; and finding and deleting “orphaned” Kithe::Indexable Solr objects that no longer exist in the rdbms.
Unlike other parts of Kithe’s indexing support, this stuff IS very solr-specific, and generally implemented with [rsolr](github.com/rsolr/rsolr).
Class Method Summary collapse
-
.delete_solr_orphans(batch_size: 100, solr_url: Kithe.indexable_settings.solr_url) ⇒ Object
Finds any Solr objects that have a ‘model_name_ssi` field (or `Kithe.indexable_settings.model_name_solr_field` if non-default), but don’t exist in the rdbms, and deletes them from Solr, then issues a commit.
-
.solr_orphan_geomg_ids(batch_size: 100, solr_url: Kithe.indexable_settings.solr_url) ⇒ Object
based on sunspot, does not depend on Blacklight.
Class Method Details
.delete_solr_orphans(batch_size: 100, solr_url: Kithe.indexable_settings.solr_url) ⇒ Object
Finds any Solr objects that have a ‘model_name_ssi` field (or `Kithe.indexable_settings.model_name_solr_field` if non-default), but don’t exist in the rdbms, and deletes them from Solr, then issues a commit.
Under normal use, you shouldn’t have to do this, but can if your Solr index has gotten out of sync and you don’t want to delete it and reindex from scratch.
Implemented in terms of .solr_orphan_ids.
A bit hacky implementation, it might be nice to have a progress bar, we don’t now.
Does return an array of any IDs deleted.
73 74 75 76 77 78 79 80 81 82 83 84 85 |
# File 'app/models/geoblacklight_admin/solr_utils.rb', line 73 def self.delete_solr_orphans(batch_size: 100, solr_url: Kithe.indexable_settings.solr_url) rsolr = RSolr.connect url: solr_url deleted_geomg_ids = [] solr_orphan_geomg_ids(batch_size: batch_size, solr_url: solr_url) do |orphan_geomg_id| deleted_geomg_ids << orphan_geomg_id rsolr.delete_by_query("geomg_id_s:#{orphan_geomg_id}") end rsolr.commit deleted_geomg_ids end |
.solr_orphan_geomg_ids(batch_size: 100, solr_url: Kithe.indexable_settings.solr_url) ⇒ Object
based on sunspot, does not depend on Blacklight. github.com/sunspot/sunspot/blob/3328212da79178319e98699d408f14513855d3c0/sunspot_rails/lib/sunspot/rails/searchable.rb#L332
solr_index_orphans do |orphaned_id|
delete(id)
end
It is searching for any Solr object with a ‘Kithe.indexable_settings.model_name_solr_field` field (default `model_name_ssi`). Then, it takes the ID and makes sure it exists in the database using Kithe::Model. At the moment we are assuming everything is in Kithe::Model, rather than trying to use the `model_name_ssi` to fetch from different tables. Could maybe be enhanced to not.
This is intended mostly for use by .delete_solr_orphans
A bit hacky implementation, it might be nice to support a progress bar, we don’t now.
33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 |
# File 'app/models/geoblacklight_admin/solr_utils.rb', line 33 def self.solr_orphan_geomg_ids(batch_size: 100, solr_url: Kithe.indexable_settings.solr_url) return enum_for(:solr_index_orphan_ids) unless block_given? model_solr_id_attr = Kithe.indexable_settings.solr_id_value_attribute solr_page = -1 rsolr = RSolr.connect url: solr_url while (solr_page = solr_page.next) response = rsolr.get "select", params: { rows: batch_size, start: (batch_size * solr_page), fl: "geomg_id_s", q: "* TO *" } solr_geomg_ids = response["response"]["docs"].collect { |h| h["geomg_id_s"] } break if solr_geomg_ids.empty? (solr_geomg_ids - Kithe::Model.where(model_solr_id_attr => solr_geomg_ids).pluck(model_solr_id_attr)).each do |orphaned_geomg_id| yield orphaned_geomg_id end end end |