Class: Bulkrax::ApplicationParser Abstract

Inherits:
Object
  • Object
show all
Defined in:
app/parsers/bulkrax/application_parser.rb

Overview

This class is abstract.

Subclass the Bulkrax::ApplicationParser to create a parser that handles a specific format (e.g. CSV, Bagit, XML, etc).

An abstract class that establishes the API for Bulkrax’s import and export parsing.

Direct Known Subclasses

CsvParser, OaiDcParser, XmlParser

Instance Attribute Summary collapse

Class Method Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(importerexporter) ⇒ ApplicationParser

Returns a new instance of ApplicationParser.



37
38
39
40
# File 'app/parsers/bulkrax/application_parser.rb', line 37

def initialize(importerexporter)
  @importerexporter = importerexporter
  @headers = []
end

Instance Attribute Details

#headersObject

rubocop:disable Metrics/ClassLength



8
9
10
# File 'app/parsers/bulkrax/application_parser.rb', line 8

def headers
  @headers
end

#importerexporterObject Also known as: importer, exporter

rubocop:disable Metrics/ClassLength



8
9
10
# File 'app/parsers/bulkrax/application_parser.rb', line 8

def importerexporter
  @importerexporter
end

Class Method Details

.export_supported?TrueClass, FalseClass

TODO:

Convert to ‘class_attribute :export_supported, default: false, instance_predicate: true` and `self << class; alias export_supported? export_supported; end`

Returns this parser does or does not support exports.

Returns:

  • (TrueClass, FalseClass)

    this parser does or does not support exports.



26
27
28
# File 'app/parsers/bulkrax/application_parser.rb', line 26

def self.export_supported?
  false
end

.import_supported?TrueClass, FalseClass

TODO:

Convert to ‘class_attribute :import_supported, default: false, instance_predicate: true` and `self << class; alias import_supported? import_supported; end`

Returns this parser does or does not support imports.

Returns:

  • (TrueClass, FalseClass)

    this parser does or does not support imports.



33
34
35
# File 'app/parsers/bulkrax/application_parser.rb', line 33

def self.import_supported?
  true
end

.parser_fieldsObject

TODO:

Convert to ‘class_attribute :parser_fiels, default: {}`



19
20
21
# File 'app/parsers/bulkrax/application_parser.rb', line 19

def self.parser_fields
  {}
end

Instance Method Details

#base_path(type = 'import') ⇒ String

Base path for imported and exported files

Parameters:

  • (String)

Returns:

  • (String)

    the base path for files that this parser will “parse”



291
292
293
294
295
# File 'app/parsers/bulkrax/application_parser.rb', line 291

def base_path(type = 'import')
  # account for multiple versions of hyku
  is_multitenant = ENV['HYKU_MULTITENANT'] == 'true' || ENV['SETTINGS__MULTITENANCY__ENABLED'] == 'true'
  is_multitenant ? File.join(Bulkrax.send("#{type}_path"), ::Site.instance..name) : Bulkrax.send("#{type}_path")
end

#calculate_type_delay(type) ⇒ Object



237
238
239
240
241
# File 'app/parsers/bulkrax/application_parser.rb', line 237

def calculate_type_delay(type)
  return 2.minutes if type == 'file_set'
  return 1.minute if type == 'work'
  return 0
end

#collection_entry_classObject

This method is abstract.

Subclass and override #collection_entry_class to implement behavior for the parser.

Raises:

  • (NotImplementedError)


54
55
56
# File 'app/parsers/bulkrax/application_parser.rb', line 54

def collection_entry_class
  raise NotImplementedError, 'must be defined'
end

#collections_totalObject



417
418
419
# File 'app/parsers/bulkrax/application_parser.rb', line 417

def collections_total
  0
end

#create_collectionsObject



162
163
164
# File 'app/parsers/bulkrax/application_parser.rb', line 162

def create_collections
  create_objects(['collection'])
end

#create_entry_and_job(current_record, type, identifier = nil) ⇒ Object



257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
# File 'app/parsers/bulkrax/application_parser.rb', line 257

def create_entry_and_job(current_record, type, identifier = nil)
  identifier ||= current_record[source_identifier]
  new_entry = find_or_create_entry(send("#{type}_entry_class"),
                                   identifier,
                                   'Bulkrax::Importer',
                                   (current_record))
  new_entry.status_info('Pending', importer.current_run)
  if record_deleted?(current_record)
    "Bulkrax::Delete#{type.camelize}Job".constantize.send(perform_method, new_entry, current_run)
  elsif record_remove_and_rerun?(current_record) || remove_and_rerun
    delay = calculate_type_delay(type)
    "Bulkrax::DeleteAndImport#{type.camelize}Job".constantize.set(wait: delay).send(perform_method, new_entry, current_run)
  else
    "Bulkrax::Import#{type.camelize}Job".constantize.send(perform_method, new_entry.id, current_run.id)
  end
end

#create_file_setsObject



170
171
172
# File 'app/parsers/bulkrax/application_parser.rb', line 170

def create_file_sets
  create_objects(['file_set'])
end

#create_objects(types_array = nil) ⇒ Object

Parameters:

  • types (Array<Symbol>)

    the types of objects that we’ll create.

See Also:



187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
# File 'app/parsers/bulkrax/application_parser.rb', line 187

def create_objects(types_array = nil)
  index = 0
  (types_array || %w[collection work file_set relationship]).each do |type|
    if type.eql?('relationship')
      ScheduleRelationshipsJob.set(wait: 5.minutes).perform_later(importer_id: importerexporter.id)
      next
    end
    send(type.pluralize).each do |current_record|
      next unless record_has_source_identifier(current_record, index)
      break if limit_reached?(limit, index)
      seen[current_record[source_identifier]] = true
      create_entry_and_job(current_record, type)
      increment_counters(index, "#{type}": true)
      index += 1
    end
    importer.record_status
  end
  true
rescue StandardError => e
  set_status_info(e)
end

#create_relationshipsObject



174
175
176
# File 'app/parsers/bulkrax/application_parser.rb', line 174

def create_relationships
  create_objects(['relationship'])
end

#create_worksObject



166
167
168
# File 'app/parsers/bulkrax/application_parser.rb', line 166

def create_works
  create_objects(['work'])
end

#entry_classObject

This method is abstract.

Subclass and override #entry_class to implement behavior for the parser.

Raises:

  • (NotImplementedError)


44
45
46
# File 'app/parsers/bulkrax/application_parser.rb', line 44

def entry_class
  raise NotImplementedError, 'must be defined'
end

#exporter?TrueClass, FalseClass

Returns:

  • (TrueClass, FalseClass)


322
323
324
# File 'app/parsers/bulkrax/application_parser.rb', line 322

def exporter?
  importerexporter.is_a?(Bulkrax::Exporter)
end

#file_set_entry_classObject

This method is abstract.

Subclass and override #file_set_entry_class to implement behavior for the parser.

Raises:

  • (NotImplementedError)


60
61
62
# File 'app/parsers/bulkrax/application_parser.rb', line 60

def file_set_entry_class
  raise NotImplementedError, 'must be defined'
end

#file_sets_totalObject



421
422
423
# File 'app/parsers/bulkrax/application_parser.rb', line 421

def file_sets_total
  0
end

#find_or_create_entry(entryclass, identifier, type, raw_metadata = nil) ⇒ Object



386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
# File 'app/parsers/bulkrax/application_parser.rb', line 386

def find_or_create_entry(entryclass, identifier, type,  = nil)
  # limit entry search to just this importer or exporter. Don't go moving them
  entry = importerexporter.entries.where(
    identifier: identifier
  ).first
  entry ||= entryclass.new(
    importerexporter_id: importerexporter.id,
    importerexporter_type: type,
    identifier: identifier
  )
  entry. = 
  # Setting parsed_metadata specifically for the id so we can find the object via the
  # id in a delete.  This is likely to get clobbered in a regular import, which is fine.
  entry. = { id: ['id'] } if &.key?('id')
  entry.save!
  entry
end

#generated_metadata_mappingString

Returns:

  • (String)


94
95
96
# File 'app/parsers/bulkrax/application_parser.rb', line 94

def 
  @generated_metadata_mapping ||= 'generated'
end

#get_field_mapping_hash_for(key) ⇒ Object

This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.

Raises:

  • (StandardError)


123
124
125
126
127
128
129
130
131
132
133
134
# File 'app/parsers/bulkrax/application_parser.rb', line 123

def get_field_mapping_hash_for(key)
  return instance_variable_get("@#{key}_hash") if instance_variable_get("@#{key}_hash").present?

  mapping = importerexporter.field_mapping.is_a?(Hash) ? importerexporter.field_mapping : {}
  instance_variable_set(
    "@#{key}_hash",
    mapping&.with_indifferent_access&.select { |_, h| h.key?(key) }
  )
  raise StandardError, "more than one #{key} declared: #{instance_variable_get("@#{key}_hash").keys.join(', ')}" if instance_variable_get("@#{key}_hash").length > 1

  instance_variable_get("@#{key}_hash")
end

#import_file_pathString

Path for the import

Returns:

  • (String)


466
467
468
# File 'app/parsers/bulkrax/application_parser.rb', line 466

def import_file_path
  @import_file_path ||= real_import_file_path
end

#importer?TrueClass, FalseClass

Returns:

  • (TrueClass, FalseClass)


317
318
319
# File 'app/parsers/bulkrax/application_parser.rb', line 317

def importer?
  importerexporter.is_a?(Bulkrax::Importer)
end

#invalid_record(message) ⇒ Object

rubocop:disable Rails/SkipsModelValidations



355
356
357
358
359
360
361
# File 'app/parsers/bulkrax/application_parser.rb', line 355

def invalid_record(message)
  current_run.invalid_records ||= ""
  current_run.invalid_records += message
  current_run.save
  ImporterRun.increment_counter(:failed_records, current_run.id)
  ImporterRun.decrement_counter(:enqueued_records, current_run.id) unless ImporterRun.find(current_run.id).enqueued_records <= 0 # rubocop:disable Style/IdenticalConditionalBranches
end

#limit_reached?(limit, index) ⇒ TrueClass, FalseClass

Parameters:

  • limit (Integer)

    limit set on the importerexporter

  • index (Integer)

    index of current iteration

Returns:

  • (TrueClass, FalseClass)


329
330
331
332
# File 'app/parsers/bulkrax/application_parser.rb', line 329

def limit_reached?(limit, index)
  return false if limit.nil? || limit.zero? # no limit
  index >= limit
end

#model_field_mappingsArray<String>

Returns:

  • (Array<String>)


137
138
139
140
141
142
# File 'app/parsers/bulkrax/application_parser.rb', line 137

def model_field_mappings
  model_mappings = Bulkrax.field_mappings[self.class.to_s]&.dig('model', :from) || []
  model_mappings |= ['model']

  model_mappings
end

#new_entry(entryclass, type) ⇒ Object



379
380
381
382
383
384
# File 'app/parsers/bulkrax/application_parser.rb', line 379

def new_entry(entryclass, type)
  entryclass.new(
    importerexporter_id: importerexporter.id,
    importerexporter_type: type
  )
end

#path_for_importString

Path where we’ll store the import metadata and files

this is used for uploaded and cloud files

Returns:

  • (String)


300
301
302
303
304
# File 'app/parsers/bulkrax/application_parser.rb', line 300

def path_for_import
  @path_for_import = File.join(base_path, importerexporter.path_string)
  FileUtils.mkdir_p(@path_for_import) unless File.exist?(@path_for_import)
  @path_for_import
end

#perform_methodString

Returns:

  • (String)


145
146
147
148
149
150
151
# File 'app/parsers/bulkrax/application_parser.rb', line 145

def perform_method
  if self.validate_only
    'perform_now'
  else
    'perform_later'
  end
end

#rebuild_entries(types_array = nil) ⇒ Object



209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
# File 'app/parsers/bulkrax/application_parser.rb', line 209

def rebuild_entries(types_array = nil)
  index = 0
  (types_array || %w[collection work file_set relationship]).each do |type|
    # works are not gurneteed to have Work in the type

    importer.entries.where(rebuild_entry_query(type, parser_fields['entry_statuses'])).find_each do |e|
      seen[e.identifier] = true
      e.status_info('Pending', importer.current_run)
      if remove_and_rerun
        delay = calculate_type_delay(type)
        "Bulkrax::DeleteAndImport#{type.camelize}Job".constantize.set(wait: delay).send(perform_method, e, current_run)
      else
        "Bulkrax::Import#{type.camelize}Job".constantize.send(perform_method, e.id, current_run.id)
      end
      increment_counters(index)
      index += 1
    end
  end
end

#rebuild_entry_query(type, statuses) ⇒ Object



229
230
231
232
233
234
235
# File 'app/parsers/bulkrax/application_parser.rb', line 229

def rebuild_entry_query(type, statuses)
  type_col = Bulkrax::Entry.arel_table['type']
  status_col = Bulkrax::Entry.arel_table['status_message']

  query = (type == 'work' ? type_col.does_not_match_all(%w[collection file_set]) : type_col.matches(type.camelize))
  query.and(status_col.in(statuses))
end

#record(identifier, _opts = {}) ⇒ Object

TODO:
  • review this method - is it ever used?



405
406
407
408
409
410
411
# File 'app/parsers/bulkrax/application_parser.rb', line 405

def record(identifier, _opts = {})
  return @record if @record

  @record = entry_class.new(self, identifier)
  @record.build
  return @record
end

#record_deleted?(record) ⇒ Boolean

Returns:

  • (Boolean)


247
248
249
250
# File 'app/parsers/bulkrax/application_parser.rb', line 247

def record_deleted?(record)
  return false unless record.key?(:delete)
  ActiveModel::Type::Boolean.new.cast(record[:delete])
end

#record_has_source_identifier(record, index) ⇒ TrueClass, FalseClass

Returns:

  • (TrueClass, FalseClass)


341
342
343
344
345
346
347
348
349
350
351
352
# File 'app/parsers/bulkrax/application_parser.rb', line 341

def record_has_source_identifier(record, index)
  if record[source_identifier].blank?
    if Bulkrax.fill_in_blank_source_identifiers.present?
      record[source_identifier] = Bulkrax.fill_in_blank_source_identifiers.call(self, index)
    else
      invalid_record("Missing #{source_identifier} for #{record.to_h}\n")
      false
    end
  else
    true
  end
end

#record_raw_metadata(record) ⇒ Object



243
244
245
# File 'app/parsers/bulkrax/application_parser.rb', line 243

def (record)
  record.to_h
end

#record_remove_and_rerun?(record) ⇒ Boolean

Returns:

  • (Boolean)


252
253
254
255
# File 'app/parsers/bulkrax/application_parser.rb', line 252

def record_remove_and_rerun?(record)
  return false unless record.key?(:remove_and_rerun)
  ActiveModel::Type::Boolean.new.cast(record[:remove_and_rerun])
end

#records(_opts = {}) ⇒ Object

This method is abstract.

Subclass and override #records to implement behavior for the parser.

Raises:

  • (NotImplementedError)


66
67
68
# File 'app/parsers/bulkrax/application_parser.rb', line 66

def records(_opts = {})
  raise NotImplementedError, 'must be defined'
end

Returns:

  • (String)

See Also:



118
119
120
# File 'app/parsers/bulkrax/application_parser.rb', line 118

def related_children_parsed_mapping
  @related_children_parsed_mapping ||= get_field_mapping_hash_for('related_children_field_mapping')&.keys&.first || 'children'
end

Returns:

  • (String, NilClass)

See Also:



112
113
114
# File 'app/parsers/bulkrax/application_parser.rb', line 112

def related_children_raw_mapping
  @related_children_raw_mapping ||= get_field_mapping_hash_for('related_children_field_mapping')&.values&.first&.[]('from')&.first
end

Returns:

  • (String)

See Also:

  • #related_parents_field_mapping


106
107
108
# File 'app/parsers/bulkrax/application_parser.rb', line 106

def related_parents_parsed_mapping
  @related_parents_parsed_mapping ||= get_field_mapping_hash_for('related_parents_field_mapping')&.keys&.first || 'parents'
end

Returns:

  • (String, NilClass)

See Also:



100
101
102
# File 'app/parsers/bulkrax/application_parser.rb', line 100

def related_parents_raw_mapping
  @related_parents_raw_mapping ||= get_field_mapping_hash_for('related_parents_field_mapping')&.values&.first&.[]('from')&.first
end

#required_elementsArray<String>

Returns:

  • (Array<String>)


365
366
367
368
369
370
371
372
373
374
375
376
377
# File 'app/parsers/bulkrax/application_parser.rb', line 365

def required_elements
  matched_elements = ((importerexporter.mapping.keys || []) & (Bulkrax.required_elements || []))
  unless matched_elements.count == Bulkrax.required_elements.count
    missing_elements = Bulkrax.required_elements - matched_elements
    error_alert = "Missing mapping for at least one required element, missing mappings are: #{missing_elements.join(', ')}"
    raise StandardError, error_alert
  end
  if Bulkrax.fill_in_blank_source_identifiers
    Bulkrax.required_elements
  else
    Bulkrax.required_elements + [source_identifier]
  end
end

#retrieve_cloud_files(_files, _importer) ⇒ Object

Optional, define if using browse everything for file upload



275
# File 'app/parsers/bulkrax/application_parser.rb', line 275

def retrieve_cloud_files(_files, _importer); end

#setup_export_fileObject

This method is abstract.

Subclass and override #setup_export_file to implement behavior for the parser.

Raises:

  • (NotImplementedError)


307
308
309
# File 'app/parsers/bulkrax/application_parser.rb', line 307

def setup_export_file
  raise NotImplementedError, 'must be defined' if exporter?
end

#source_identifierSymbol

importing (e.g. is not this application that mounts this Bulkrax engine).

Returns:

  • (Symbol)

    the name of the identifying property in the source system from which we’re

See Also:



75
76
77
# File 'app/parsers/bulkrax/application_parser.rb', line 75

def source_identifier
  @source_identifier ||= get_field_mapping_hash_for('source_identifier')&.values&.first&.[]('from')&.first&.to_sym || :source_identifier
end

#totalObject



413
414
415
# File 'app/parsers/bulkrax/application_parser.rb', line 413

def total
  0
end

#untar(file_to_untar) ⇒ Object



442
443
444
445
446
447
# File 'app/parsers/bulkrax/application_parser.rb', line 442

def untar(file_to_untar)
  Dir.mkdir(importer_unzip_path) unless File.directory?(importer_unzip_path)
  command = "tar -xzf #{Shellwords.escape(file_to_untar)} -C #{Shellwords.escape(importer_unzip_path)}"
  result = system(command)
  raise "Failed to extract #{file_to_untar}" unless result
end

#unzip(file_to_unzip) ⇒ Object



430
431
432
433
434
435
436
437
438
439
440
# File 'app/parsers/bulkrax/application_parser.rb', line 430

def unzip(file_to_unzip)
  return untar(file_to_unzip) if file_to_unzip.end_with?('.tar.gz')

  Zip::File.open(file_to_unzip) do |zip_file|
    zip_file.each do |entry|
      entry_path = File.join(importer_unzip_path, entry.name)
      FileUtils.mkdir_p(File.dirname(entry_path))
      zip_file.extract(entry, entry_path) unless File.exist?(entry_path)
    end
  end
end

#valid_import?TrueClass, FalseClass

Override to add specific validations

Returns:

  • (TrueClass, FalseClass)


336
337
338
# File 'app/parsers/bulkrax/application_parser.rb', line 336

def valid_import?
  true
end

#visibilityString

The visibility of the record. Acceptable values are: “open”, “embargo”, “lease”, “authenticated”, “restricted”. The default is “open”



158
159
160
# File 'app/parsers/bulkrax/application_parser.rb', line 158

def visibility
  @visibility ||= self.parser_fields['visibility'] || 'open'
end

#work_entry_classObject



48
49
50
# File 'app/parsers/bulkrax/application_parser.rb', line 48

def work_entry_class
  entry_class
end

#work_identifierSymbol

Returns the name of the identifying property for the system which we’re importing into (e.g. the application that mounts this Bulkrax engine).

Returns:

  • (Symbol)

    the name of the identifying property for the system which we’re importing into (e.g. the application that mounts this Bulkrax engine)

See Also:



82
83
84
# File 'app/parsers/bulkrax/application_parser.rb', line 82

def work_identifier
  @work_identifier ||= get_field_mapping_hash_for('source_identifier')&.keys&.first&.to_sym || :source
end

#work_identifier_search_fieldSymbol

Returns the solr property of the source_identifier. Used for searching. defaults to work_identifier value + “_sim”.

Returns:

  • (Symbol)

    the solr property of the source_identifier. Used for searching. defaults to work_identifier value + “_sim”

See Also:



89
90
91
# File 'app/parsers/bulkrax/application_parser.rb', line 89

def work_identifier_search_field
  @work_identifier_search_field ||= Array.wrap(get_field_mapping_hash_for('source_identifier')&.values&.first&.[]('search_field'))&.first&.to_s || "#{work_identifier}_sim"
end

#writeObject



425
426
427
428
# File 'app/parsers/bulkrax/application_parser.rb', line 425

def write
  write_files
  zip
end

#write_filesObject

This method is abstract.

Subclass and override #write_files to implement behavior for the parser.

Raises:

  • (NotImplementedError)


312
313
314
# File 'app/parsers/bulkrax/application_parser.rb', line 312

def write_files
  raise NotImplementedError, 'must be defined' if exporter?
end

#write_import_file(file) ⇒ Object

Parameters:

  • file (#path, #original_filename)

    the file object that with the relevant data for the import.



279
280
281
282
283
284
285
286
# File 'app/parsers/bulkrax/application_parser.rb', line 279

def write_import_file(file)
  path = File.join(path_for_import, file.original_filename)
  FileUtils.mv(
    file.path,
    path
  )
  path
end

#zipObject



449
450
451
452
453
454
455
456
457
458
459
460
461
462
# File 'app/parsers/bulkrax/application_parser.rb', line 449

def zip
  FileUtils.mkdir_p(exporter_export_zip_path)

  Dir["#{exporter_export_path}/**"].each do |folder|
    zip_path = "#{exporter_export_zip_path.split('/').last}_#{folder.split('/').last}.zip"
    FileUtils.rm_rf("#{exporter_export_zip_path}/#{zip_path}")

    Zip::File.open(File.join("#{exporter_export_zip_path}/#{zip_path}"), create: true) do |zip_file|
      Dir["#{folder}/**/**"].each do |file|
        zip_file.add(file.sub("#{folder}/", ''), file)
      end
    end
  end
end