Class: Bulkrax::ApplicationParser Abstract
- Inherits:
-
Object
- Object
- Bulkrax::ApplicationParser
- Defined in:
- app/parsers/bulkrax/application_parser.rb
Overview
Subclass the Bulkrax::ApplicationParser to create a parser that handles a specific format (e.g. CSV, Bagit, XML, etc).
An abstract class that establishes the API for Bulkrax’s import and export parsing.
Direct Known Subclasses
Instance Attribute Summary collapse
-
#headers ⇒ Object
rubocop:disable Metrics/ClassLength.
-
#importerexporter ⇒ Object
(also: #importer, #exporter)
rubocop:disable Metrics/ClassLength.
Class Method Summary collapse
-
.export_supported? ⇒ TrueClass, FalseClass
This parser does or does not support exports.
-
.import_supported? ⇒ TrueClass, FalseClass
This parser does or does not support imports.
- .parser_fields ⇒ Object
Instance Method Summary collapse
-
#base_path(type = 'import') ⇒ String
Base path for imported and exported files.
- #calculate_type_delay(type) ⇒ Object
- #collection_entry_class ⇒ Object abstract
- #collections_total ⇒ Object
- #create_collections ⇒ Object
- #create_entry_and_job(current_record, type, identifier = nil) ⇒ Object
- #create_file_sets ⇒ Object
- #create_objects(types_array = nil) ⇒ Object
- #create_relationships ⇒ Object
- #create_works ⇒ Object
- #entry_class ⇒ Object abstract
- #exporter? ⇒ TrueClass, FalseClass
- #file_set_entry_class ⇒ Object abstract
- #file_sets_total ⇒ Object
- #find_or_create_entry(entryclass, identifier, type, raw_metadata = nil) ⇒ Object
- #generated_metadata_mapping ⇒ String
- #get_field_mapping_hash_for(key) ⇒ Object private
-
#import_file_path ⇒ String
Path for the import.
- #importer? ⇒ TrueClass, FalseClass
-
#initialize(importerexporter) ⇒ ApplicationParser
constructor
A new instance of ApplicationParser.
-
#invalid_record(message) ⇒ Object
rubocop:disable Rails/SkipsModelValidations.
- #limit_reached?(limit, index) ⇒ TrueClass, FalseClass
- #model_field_mappings ⇒ Array<String>
- #new_entry(entryclass, type) ⇒ Object
-
#path_for_import ⇒ String
Path where we’ll store the import metadata and files this is used for uploaded and cloud files.
- #perform_method ⇒ String
- #rebuild_entries(types_array = nil) ⇒ Object
- #rebuild_entry_query(type, statuses) ⇒ Object
- #record(identifier, _opts = {}) ⇒ Object
- #record_deleted?(record) ⇒ Boolean
- #record_has_source_identifier(record, index) ⇒ TrueClass, FalseClass
- #record_raw_metadata(record) ⇒ Object
- #record_remove_and_rerun?(record) ⇒ Boolean
- #records(_opts = {}) ⇒ Object abstract
- #related_children_parsed_mapping ⇒ String
- #related_children_raw_mapping ⇒ String, NilClass
- #related_parents_parsed_mapping ⇒ String
- #related_parents_raw_mapping ⇒ String, NilClass
- #required_elements ⇒ Array<String>
-
#retrieve_cloud_files(_files, _importer) ⇒ Object
Optional, define if using browse everything for file upload.
- #setup_export_file ⇒ Object abstract
-
#source_identifier ⇒ Symbol
importing (e.g. is not this application that mounts this Bulkrax engine).
- #total ⇒ Object
- #untar(file_to_untar) ⇒ Object
- #unzip(file_to_unzip) ⇒ Object
-
#valid_import? ⇒ TrueClass, FalseClass
Override to add specific validations.
-
#visibility ⇒ String
The visibility of the record.
- #work_entry_class ⇒ Object
-
#work_identifier ⇒ Symbol
The name of the identifying property for the system which we’re importing into (e.g. the application that mounts this Bulkrax engine).
-
#work_identifier_search_field ⇒ Symbol
The solr property of the source_identifier.
- #write ⇒ Object
- #write_files ⇒ Object abstract
- #write_import_file(file) ⇒ Object
- #zip ⇒ Object
Constructor Details
#initialize(importerexporter) ⇒ ApplicationParser
Returns a new instance of ApplicationParser.
37 38 39 40 |
# File 'app/parsers/bulkrax/application_parser.rb', line 37 def initialize(importerexporter) @importerexporter = importerexporter @headers = [] end |
Instance Attribute Details
#headers ⇒ Object
rubocop:disable Metrics/ClassLength
8 9 10 |
# File 'app/parsers/bulkrax/application_parser.rb', line 8 def headers @headers end |
#importerexporter ⇒ Object Also known as: importer, exporter
rubocop:disable Metrics/ClassLength
8 9 10 |
# File 'app/parsers/bulkrax/application_parser.rb', line 8 def importerexporter @importerexporter end |
Class Method Details
.export_supported? ⇒ TrueClass, FalseClass
Convert to ‘class_attribute :export_supported, default: false, instance_predicate: true` and `self << class; alias export_supported? export_supported; end`
Returns this parser does or does not support exports.
26 27 28 |
# File 'app/parsers/bulkrax/application_parser.rb', line 26 def self.export_supported? false end |
.import_supported? ⇒ TrueClass, FalseClass
Convert to ‘class_attribute :import_supported, default: false, instance_predicate: true` and `self << class; alias import_supported? import_supported; end`
Returns this parser does or does not support imports.
33 34 35 |
# File 'app/parsers/bulkrax/application_parser.rb', line 33 def self.import_supported? true end |
.parser_fields ⇒ Object
Convert to ‘class_attribute :parser_fiels, default: {}`
19 20 21 |
# File 'app/parsers/bulkrax/application_parser.rb', line 19 def self.parser_fields {} end |
Instance Method Details
#base_path(type = 'import') ⇒ String
Base path for imported and exported files
291 292 293 294 295 |
# File 'app/parsers/bulkrax/application_parser.rb', line 291 def base_path(type = 'import') # account for multiple versions of hyku is_multitenant = ENV['HYKU_MULTITENANT'] == 'true' || ENV['SETTINGS__MULTITENANCY__ENABLED'] == 'true' is_multitenant ? File.join(Bulkrax.send("#{type}_path"), ::Site.instance.account.name) : Bulkrax.send("#{type}_path") end |
#calculate_type_delay(type) ⇒ Object
237 238 239 240 241 |
# File 'app/parsers/bulkrax/application_parser.rb', line 237 def calculate_type_delay(type) return 2.minutes if type == 'file_set' return 1.minute if type == 'work' return 0 end |
#collection_entry_class ⇒ Object
Subclass and override #collection_entry_class to implement behavior for the parser.
54 55 56 |
# File 'app/parsers/bulkrax/application_parser.rb', line 54 def collection_entry_class raise NotImplementedError, 'must be defined' end |
#collections_total ⇒ Object
417 418 419 |
# File 'app/parsers/bulkrax/application_parser.rb', line 417 def collections_total 0 end |
#create_collections ⇒ Object
162 163 164 |
# File 'app/parsers/bulkrax/application_parser.rb', line 162 def create_collections create_objects(['collection']) end |
#create_entry_and_job(current_record, type, identifier = nil) ⇒ Object
257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 |
# File 'app/parsers/bulkrax/application_parser.rb', line 257 def create_entry_and_job(current_record, type, identifier = nil) identifier ||= current_record[source_identifier] new_entry = find_or_create_entry(send("#{type}_entry_class"), identifier, 'Bulkrax::Importer', (current_record)) new_entry.status_info('Pending', importer.current_run) if record_deleted?(current_record) "Bulkrax::Delete#{type.camelize}Job".constantize.send(perform_method, new_entry, current_run) elsif record_remove_and_rerun?(current_record) || remove_and_rerun delay = calculate_type_delay(type) "Bulkrax::DeleteAndImport#{type.camelize}Job".constantize.set(wait: delay).send(perform_method, new_entry, current_run) else "Bulkrax::Import#{type.camelize}Job".constantize.send(perform_method, new_entry.id, current_run.id) end end |
#create_file_sets ⇒ Object
170 171 172 |
# File 'app/parsers/bulkrax/application_parser.rb', line 170 def create_file_sets create_objects(['file_set']) end |
#create_objects(types_array = nil) ⇒ Object
187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 |
# File 'app/parsers/bulkrax/application_parser.rb', line 187 def create_objects(types_array = nil) index = 0 (types_array || %w[collection work file_set relationship]).each do |type| if type.eql?('relationship') ScheduleRelationshipsJob.set(wait: 5.minutes).perform_later(importer_id: importerexporter.id) next end send(type.pluralize).each do |current_record| next unless record_has_source_identifier(current_record, index) break if limit_reached?(limit, index) seen[current_record[source_identifier]] = true create_entry_and_job(current_record, type) increment_counters(index, "#{type}": true) index += 1 end importer.record_status end true rescue StandardError => e set_status_info(e) end |
#create_relationships ⇒ Object
174 175 176 |
# File 'app/parsers/bulkrax/application_parser.rb', line 174 def create_relationships create_objects(['relationship']) end |
#create_works ⇒ Object
166 167 168 |
# File 'app/parsers/bulkrax/application_parser.rb', line 166 def create_works create_objects(['work']) end |
#entry_class ⇒ Object
Subclass and override #entry_class to implement behavior for the parser.
44 45 46 |
# File 'app/parsers/bulkrax/application_parser.rb', line 44 def entry_class raise NotImplementedError, 'must be defined' end |
#exporter? ⇒ TrueClass, FalseClass
322 323 324 |
# File 'app/parsers/bulkrax/application_parser.rb', line 322 def exporter? importerexporter.is_a?(Bulkrax::Exporter) end |
#file_set_entry_class ⇒ Object
Subclass and override #file_set_entry_class to implement behavior for the parser.
60 61 62 |
# File 'app/parsers/bulkrax/application_parser.rb', line 60 def file_set_entry_class raise NotImplementedError, 'must be defined' end |
#file_sets_total ⇒ Object
421 422 423 |
# File 'app/parsers/bulkrax/application_parser.rb', line 421 def file_sets_total 0 end |
#find_or_create_entry(entryclass, identifier, type, raw_metadata = nil) ⇒ Object
386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 |
# File 'app/parsers/bulkrax/application_parser.rb', line 386 def find_or_create_entry(entryclass, identifier, type, = nil) # limit entry search to just this importer or exporter. Don't go moving them entry = importerexporter.entries.where( identifier: identifier ).first entry ||= entryclass.new( importerexporter_id: importerexporter.id, importerexporter_type: type, identifier: identifier ) entry. = # Setting parsed_metadata specifically for the id so we can find the object via the # id in a delete. This is likely to get clobbered in a regular import, which is fine. entry. = { id: ['id'] } if &.key?('id') entry.save! entry end |
#generated_metadata_mapping ⇒ String
94 95 96 |
# File 'app/parsers/bulkrax/application_parser.rb', line 94 def @generated_metadata_mapping ||= 'generated' end |
#get_field_mapping_hash_for(key) ⇒ Object
This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.
123 124 125 126 127 128 129 130 131 132 133 134 |
# File 'app/parsers/bulkrax/application_parser.rb', line 123 def get_field_mapping_hash_for(key) return instance_variable_get("@#{key}_hash") if instance_variable_get("@#{key}_hash").present? mapping = importerexporter.field_mapping.is_a?(Hash) ? importerexporter.field_mapping : {} instance_variable_set( "@#{key}_hash", mapping&.with_indifferent_access&.select { |_, h| h.key?(key) } ) raise StandardError, "more than one #{key} declared: #{instance_variable_get("@#{key}_hash").keys.join(', ')}" if instance_variable_get("@#{key}_hash").length > 1 instance_variable_get("@#{key}_hash") end |
#import_file_path ⇒ String
Path for the import
466 467 468 |
# File 'app/parsers/bulkrax/application_parser.rb', line 466 def import_file_path @import_file_path ||= real_import_file_path end |
#importer? ⇒ TrueClass, FalseClass
317 318 319 |
# File 'app/parsers/bulkrax/application_parser.rb', line 317 def importer? importerexporter.is_a?(Bulkrax::Importer) end |
#invalid_record(message) ⇒ Object
rubocop:disable Rails/SkipsModelValidations
355 356 357 358 359 360 361 |
# File 'app/parsers/bulkrax/application_parser.rb', line 355 def invalid_record() current_run.invalid_records ||= "" current_run.invalid_records += current_run.save ImporterRun.increment_counter(:failed_records, current_run.id) ImporterRun.decrement_counter(:enqueued_records, current_run.id) unless ImporterRun.find(current_run.id).enqueued_records <= 0 # rubocop:disable Style/IdenticalConditionalBranches end |
#limit_reached?(limit, index) ⇒ TrueClass, FalseClass
329 330 331 332 |
# File 'app/parsers/bulkrax/application_parser.rb', line 329 def limit_reached?(limit, index) return false if limit.nil? || limit.zero? # no limit index >= limit end |
#model_field_mappings ⇒ Array<String>
137 138 139 140 141 142 |
# File 'app/parsers/bulkrax/application_parser.rb', line 137 def model_field_mappings model_mappings = Bulkrax.field_mappings[self.class.to_s]&.dig('model', :from) || [] model_mappings |= ['model'] model_mappings end |
#new_entry(entryclass, type) ⇒ Object
379 380 381 382 383 384 |
# File 'app/parsers/bulkrax/application_parser.rb', line 379 def new_entry(entryclass, type) entryclass.new( importerexporter_id: importerexporter.id, importerexporter_type: type ) end |
#path_for_import ⇒ String
Path where we’ll store the import metadata and files
this is used for uploaded and cloud files
300 301 302 303 304 |
# File 'app/parsers/bulkrax/application_parser.rb', line 300 def path_for_import @path_for_import = File.join(base_path, importerexporter.path_string) FileUtils.mkdir_p(@path_for_import) unless File.exist?(@path_for_import) @path_for_import end |
#perform_method ⇒ String
145 146 147 148 149 150 151 |
# File 'app/parsers/bulkrax/application_parser.rb', line 145 def perform_method if self.validate_only 'perform_now' else 'perform_later' end end |
#rebuild_entries(types_array = nil) ⇒ Object
209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 |
# File 'app/parsers/bulkrax/application_parser.rb', line 209 def rebuild_entries(types_array = nil) index = 0 (types_array || %w[collection work file_set relationship]).each do |type| # works are not gurneteed to have Work in the type importer.entries.where(rebuild_entry_query(type, parser_fields['entry_statuses'])).find_each do |e| seen[e.identifier] = true e.status_info('Pending', importer.current_run) if remove_and_rerun delay = calculate_type_delay(type) "Bulkrax::DeleteAndImport#{type.camelize}Job".constantize.set(wait: delay).send(perform_method, e, current_run) else "Bulkrax::Import#{type.camelize}Job".constantize.send(perform_method, e.id, current_run.id) end increment_counters(index) index += 1 end end end |
#rebuild_entry_query(type, statuses) ⇒ Object
229 230 231 232 233 234 235 |
# File 'app/parsers/bulkrax/application_parser.rb', line 229 def rebuild_entry_query(type, statuses) type_col = Bulkrax::Entry.arel_table['type'] status_col = Bulkrax::Entry.arel_table['status_message'] query = (type == 'work' ? type_col.does_not_match_all(%w[collection file_set]) : type_col.matches(type.camelize)) query.and(status_col.in(statuses)) end |
#record(identifier, _opts = {}) ⇒ Object
-
review this method - is it ever used?
405 406 407 408 409 410 411 |
# File 'app/parsers/bulkrax/application_parser.rb', line 405 def record(identifier, _opts = {}) return @record if @record @record = entry_class.new(self, identifier) @record.build return @record end |
#record_deleted?(record) ⇒ Boolean
247 248 249 250 |
# File 'app/parsers/bulkrax/application_parser.rb', line 247 def record_deleted?(record) return false unless record.key?(:delete) ActiveModel::Type::Boolean.new.cast(record[:delete]) end |
#record_has_source_identifier(record, index) ⇒ TrueClass, FalseClass
341 342 343 344 345 346 347 348 349 350 351 352 |
# File 'app/parsers/bulkrax/application_parser.rb', line 341 def record_has_source_identifier(record, index) if record[source_identifier].blank? if Bulkrax.fill_in_blank_source_identifiers.present? record[source_identifier] = Bulkrax.fill_in_blank_source_identifiers.call(self, index) else invalid_record("Missing #{source_identifier} for #{record.to_h}\n") false end else true end end |
#record_raw_metadata(record) ⇒ Object
243 244 245 |
# File 'app/parsers/bulkrax/application_parser.rb', line 243 def (record) record.to_h end |
#record_remove_and_rerun?(record) ⇒ Boolean
252 253 254 255 |
# File 'app/parsers/bulkrax/application_parser.rb', line 252 def record_remove_and_rerun?(record) return false unless record.key?(:remove_and_rerun) ActiveModel::Type::Boolean.new.cast(record[:remove_and_rerun]) end |
#records(_opts = {}) ⇒ Object
Subclass and override #records to implement behavior for the parser.
66 67 68 |
# File 'app/parsers/bulkrax/application_parser.rb', line 66 def records(_opts = {}) raise NotImplementedError, 'must be defined' end |
#related_children_parsed_mapping ⇒ String
118 119 120 |
# File 'app/parsers/bulkrax/application_parser.rb', line 118 def @related_children_parsed_mapping ||= get_field_mapping_hash_for('related_children_field_mapping')&.keys&.first || 'children' end |
#related_children_raw_mapping ⇒ String, NilClass
112 113 114 |
# File 'app/parsers/bulkrax/application_parser.rb', line 112 def @related_children_raw_mapping ||= get_field_mapping_hash_for('related_children_field_mapping')&.values&.first&.[]('from')&.first end |
#related_parents_parsed_mapping ⇒ String
106 107 108 |
# File 'app/parsers/bulkrax/application_parser.rb', line 106 def @related_parents_parsed_mapping ||= get_field_mapping_hash_for('related_parents_field_mapping')&.keys&.first || 'parents' end |
#related_parents_raw_mapping ⇒ String, NilClass
100 101 102 |
# File 'app/parsers/bulkrax/application_parser.rb', line 100 def @related_parents_raw_mapping ||= get_field_mapping_hash_for('related_parents_field_mapping')&.values&.first&.[]('from')&.first end |
#required_elements ⇒ Array<String>
365 366 367 368 369 370 371 372 373 374 375 376 377 |
# File 'app/parsers/bulkrax/application_parser.rb', line 365 def required_elements matched_elements = ((importerexporter.mapping.keys || []) & (Bulkrax.required_elements || [])) unless matched_elements.count == Bulkrax.required_elements.count missing_elements = Bulkrax.required_elements - matched_elements error_alert = "Missing mapping for at least one required element, missing mappings are: #{missing_elements.join(', ')}" raise StandardError, error_alert end if Bulkrax.fill_in_blank_source_identifiers Bulkrax.required_elements else Bulkrax.required_elements + [source_identifier] end end |
#retrieve_cloud_files(_files, _importer) ⇒ Object
Optional, define if using browse everything for file upload
275 |
# File 'app/parsers/bulkrax/application_parser.rb', line 275 def retrieve_cloud_files(_files, _importer); end |
#setup_export_file ⇒ Object
Subclass and override #setup_export_file to implement behavior for the parser.
307 308 309 |
# File 'app/parsers/bulkrax/application_parser.rb', line 307 def setup_export_file raise NotImplementedError, 'must be defined' if exporter? end |
#source_identifier ⇒ Symbol
importing (e.g. is not this application that mounts this Bulkrax engine).
75 76 77 |
# File 'app/parsers/bulkrax/application_parser.rb', line 75 def source_identifier @source_identifier ||= get_field_mapping_hash_for('source_identifier')&.values&.first&.[]('from')&.first&.to_sym || :source_identifier end |
#total ⇒ Object
413 414 415 |
# File 'app/parsers/bulkrax/application_parser.rb', line 413 def total 0 end |
#untar(file_to_untar) ⇒ Object
442 443 444 445 446 447 |
# File 'app/parsers/bulkrax/application_parser.rb', line 442 def untar(file_to_untar) Dir.mkdir(importer_unzip_path) unless File.directory?(importer_unzip_path) command = "tar -xzf #{Shellwords.escape(file_to_untar)} -C #{Shellwords.escape(importer_unzip_path)}" result = system(command) raise "Failed to extract #{file_to_untar}" unless result end |
#unzip(file_to_unzip) ⇒ Object
430 431 432 433 434 435 436 437 438 439 440 |
# File 'app/parsers/bulkrax/application_parser.rb', line 430 def unzip(file_to_unzip) return untar(file_to_unzip) if file_to_unzip.end_with?('.tar.gz') Zip::File.open(file_to_unzip) do |zip_file| zip_file.each do |entry| entry_path = File.join(importer_unzip_path, entry.name) FileUtils.mkdir_p(File.dirname(entry_path)) zip_file.extract(entry, entry_path) unless File.exist?(entry_path) end end end |
#valid_import? ⇒ TrueClass, FalseClass
Override to add specific validations
336 337 338 |
# File 'app/parsers/bulkrax/application_parser.rb', line 336 def valid_import? true end |
#visibility ⇒ String
The visibility of the record. Acceptable values are: “open”, “embargo”, “lease”, “authenticated”, “restricted”. The default is “open”
158 159 160 |
# File 'app/parsers/bulkrax/application_parser.rb', line 158 def visibility @visibility ||= self.parser_fields['visibility'] || 'open' end |
#work_entry_class ⇒ Object
48 49 50 |
# File 'app/parsers/bulkrax/application_parser.rb', line 48 def work_entry_class entry_class end |
#work_identifier ⇒ Symbol
Returns the name of the identifying property for the system which we’re importing into (e.g. the application that mounts this Bulkrax engine).
82 83 84 |
# File 'app/parsers/bulkrax/application_parser.rb', line 82 def work_identifier @work_identifier ||= get_field_mapping_hash_for('source_identifier')&.keys&.first&.to_sym || :source end |
#work_identifier_search_field ⇒ Symbol
Returns the solr property of the source_identifier. Used for searching. defaults to work_identifier value + “_sim”.
89 90 91 |
# File 'app/parsers/bulkrax/application_parser.rb', line 89 def work_identifier_search_field @work_identifier_search_field ||= Array.wrap(get_field_mapping_hash_for('source_identifier')&.values&.first&.[]('search_field'))&.first&.to_s || "#{work_identifier}_sim" end |
#write ⇒ Object
425 426 427 428 |
# File 'app/parsers/bulkrax/application_parser.rb', line 425 def write write_files zip end |
#write_files ⇒ Object
Subclass and override #write_files to implement behavior for the parser.
312 313 314 |
# File 'app/parsers/bulkrax/application_parser.rb', line 312 def write_files raise NotImplementedError, 'must be defined' if exporter? end |
#write_import_file(file) ⇒ Object
279 280 281 282 283 284 285 286 |
# File 'app/parsers/bulkrax/application_parser.rb', line 279 def write_import_file(file) path = File.join(path_for_import, file.original_filename) FileUtils.mv( file.path, path ) path end |
#zip ⇒ Object
449 450 451 452 453 454 455 456 457 458 459 460 461 462 |
# File 'app/parsers/bulkrax/application_parser.rb', line 449 def zip FileUtils.mkdir_p(exporter_export_zip_path) Dir["#{exporter_export_path}/**"].each do |folder| zip_path = "#{exporter_export_zip_path.split('/').last}_#{folder.split('/').last}.zip" FileUtils.rm_rf("#{exporter_export_zip_path}/#{zip_path}") Zip::File.open(File.join("#{exporter_export_zip_path}/#{zip_path}"), create: true) do |zip_file| Dir["#{folder}/**/**"].each do |file| zip_file.add(file.sub("#{folder}/", ''), file) end end end end |