Class: DerivativeRodeo::Generators::BaseGenerator

Inherits:
Object
  • Object
show all
Defined in:
lib/derivative_rodeo/generators/base_generator.rb

Overview

The Base Generator defines the interface and common methods.

Fundamentally, they are about ensuring the files end up at the specified location, based on the given:

In extending a BaseGenerator you:

#generated_files is “where the magic happens” rubocop:disable Metrics/ClassLength

Class Attributes collapse

Attributes collapse

Instance Method Summary collapse

Constructor Details

#initialize(input_uris:, output_location_template:, preprocessed_location_template: nil) ⇒ BaseGenerator

Returns a new instance of BaseGenerator.

Parameters:

  • input_uris (Array<String>)
  • output_location_template (String)

    the template used to transform the given :input_uris via Services::ConvertUriViaTemplateService.

  • preprocessed_location_template (NilClass, String) (defaults to: nil)

    when ‘nil` ignore, otherwise attempt to find preprocessed uris by transforming the :input_uris via Services::ConvertUriViaTemplateService with the given :preprocessed_location_template.

Raises:



69
70
71
72
73
74
75
76
77
# File 'lib/derivative_rodeo/generators/base_generator.rb', line 69

def initialize(input_uris:, output_location_template:, preprocessed_location_template: nil)
  @input_uris = Array.wrap(input_uris)
  @output_location_template = output_location_template
  @preprocessed_location_template = preprocessed_location_template

  return if valid_instantiation?

  raise Errors::ExtensionMissingError.new(klass: self.class)
end

Instance Attribute Details

#input_urisArray<String> (readonly)

The “original” files that we’ll be processing (via #generated_files)

Returns:

  • (Array<String>)


44
45
46
# File 'lib/derivative_rodeo/generators/base_generator.rb', line 44

def input_uris
  @input_uris
end

#output_extensionString

Returns of the form that starts with a string and may contain periods (though likely not as the first character).

Returns:

  • (String)

    of the form that starts with a string and may contain periods (though likely not as the first character).



36
# File 'lib/derivative_rodeo/generators/base_generator.rb', line 36

class_attribute :output_extension

#output_location_templateString (readonly)

The template that defines where we’ll be writing the #input_uris (via #generated_files)

Returns:

  • (String)

See Also:



50
51
52
# File 'lib/derivative_rodeo/generators/base_generator.rb', line 50

def output_location_template
  @output_location_template
end

#preprocessed_location_templateString, NilClass (readonly)

The template that defines where we might find existing processed files for the given #input_uris (via #generated_files)

Returns:

  • (String, NilClass)

See Also:



58
59
60
# File 'lib/derivative_rodeo/generators/base_generator.rb', line 58

def preprocessed_location_template
  @preprocessed_location_template
end

Instance Method Details

#build_step(input_location:, output_location:, input_tmp_file_path:) ⇒ StorageLocations::BaseLocation

Parameters:

  • input_location (StorageLocations::BaseLocation)

    the input source of the generation

  • output_location (StorageLocations::BaseLocation)

    the output location of the generation

  • input_tmp_file_path (String)

    the temporary path to the location of the given :input_location to enable further processing on the file.

Returns:

Raises:

  • (NotImplementedError)

See Also:



102
103
104
# File 'lib/derivative_rodeo/generators/base_generator.rb', line 102

def build_step(input_location:, output_location:, input_tmp_file_path:)
  raise NotImplementedError, "#{self.class}#build_step"
end

#derive_preprocessed_template_from(input_location:, preprocessed_location_template:) ⇒ String

Some generators (e.g. PdfSplitGenerator) need to cooerce the location template based on the input location. Most often, however, the given :preprocessed_location_template is adequate and would be the typical returned value.

rubocop:disable Lint/UnusedMethodArgument

Parameters:

Returns:

  • (String)


292
293
294
# File 'lib/derivative_rodeo/generators/base_generator.rb', line 292

def derive_preprocessed_template_from(input_location:, preprocessed_location_template:)
  preprocessed_location_template
end

#destination(input_location) ⇒ StorageLocations::BaseLocation

Returns the output location for the given :input_location. The file at the location destination might exist or might not. In the case where we have a #preprocessed_location_template, we’ll also check the preprocessed location for the file, and if it exists there copy it to the target output location.

In the case of non-existence, then the #build_step will create the file.

rubocop:disable Metrics/MethodLength rubocop:disable Metrics/AbcSize

Parameters:

Returns:

See Also:

  • DerivativeRodeo::Generators::BaseGenerator.[StorageLocations[StorageLocations::BaseLocation[StorageLocations::BaseLocation#exist?]


224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
# File 'lib/derivative_rodeo/generators/base_generator.rb', line 224

def destination(input_location)
  output_location = input_location.derived_file_from(template: output_location_template, extension: output_extension)

  if output_location.exist?
    log_message = "#{self.class}#destination :: " \
                  "input_location file_uri #{input_location.file_uri} :: " \
                  "Found output_location file_uri #{output_location.file_uri}."
    logger.info(log_message)

    return output_location
  end

  unless preprocessed_location_template
    log_message = "#{self.class}#destination :: " \
                  "input_location file_uri #{input_location.file_uri} :: " \
                  "No preprocessed_location_template provided " \
                  "nor does a file exist at output_location file_uri #{output_location.file_uri}; " \
                  "moving on to generation via #{self.class}#build_step."
    logger.info(log_message)

    return output_location
  end

  template = derive_preprocessed_template_from(input_location: input_location, preprocessed_location_template: preprocessed_location_template)

  preprocessed_location = input_location.derived_file_from(template: template, extension: output_extension)
  # We only want the location if it exists
  if preprocessed_location.exist?
    log_message = "#{self.class}#destination :: " \
                  "input_location file_uri #{input_location.file_uri} :: " \
                  "Found preprocessed_location file_uri #{preprocessed_location.file_uri}."
    logger.info(log_message)

    # Let's make sure we reap the fruits of the pre-processing; and don't worry that generator
    # will also write some logs.
    output_location = CopyGenerator.new(
      input_uris: [preprocessed_location.file_uri],
      output_location_template: output_location.file_uri
    ).generated_files.first

    return output_location
  end

  log_message = "#{self.class}#destination :: " \
                "input_location file_uri #{input_location.file_uri} :: " \
                "No file exists at preprocessed_location file_uri #{preprocessed_location.file_uri} " \
                "nor output_location file_uri #{output_location.file_uri}; " \
                "moving on to generation via #{self.class}#build_step."
  logger.info(log_message)

  # NOTE: The file does not exist at the output_location; but we pass this information along so
  # that the #build_step knows where to write the file.
  output_location
end

#generated_filesArray<StorageLocations::BaseLocation>

Note:

This is the method where the magic happens!

Based on the #input_uris ensure that we have files at the given output location (as derived from the #output_location_template). We ensure that by:

  • Checking if a file already exists at the output location

  • Copying a preprocessed file to the output location if a preprocessed file exists

  • Generating the file based on the input location

rubocop:disable Metrics/MethodLength



123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
# File 'lib/derivative_rodeo/generators/base_generator.rb', line 123

def generated_files
  # TODO: Examples please
  return @generated_files if defined?(@generated_files)

  logger.info("Starting #{self.class}#generated_files with " \
              "input_uris: #{input_uris.inspect}, " \
              "output_location_template: #{output_location_template.inspect}, and " \
              "preprocessed_location_template: #{preprocessed_location_template.inspect}.")
  # As much as I would like to use map or returned values; given the implementations it's
  # better to explicitly require that; reducing downstream implementation headaches.
  #
  # In other words, this little bit of ugly in a method that has yet to change in a subclass
  # helps ease subclass implementations of the #with_each_requisite_location_and_tmp_file_path or
  # #build_step
  @generated_files = []

  # BaseLocation is like the Ruby `File` (Pathname) "File.exist?(path) :: location.exist?"
  # "file:///Users/jfriesen/.profile"
  with_each_requisite_location_and_tmp_file_path do |input_location, input_tmp_file_path|
    output_location = destination(input_location)
    @generated_files << if output_location.exist?
                          output_location
                        else
                          log_message = "#{self.class}#generated_files :: " \
                                    "input_location file_uri #{input_location.file_uri} :: " \
                                    "Generating output_location file_uri #{output_location.file_uri} via build_step."
                          logger.info(log_message)
                          build_step(input_location: input_location, output_location: output_location, input_tmp_file_path: input_tmp_file_path)
                        end
  end
  @generated_files
end

#generated_urisArray<String>

Returns:

  • (Array<String>)

See Also:



160
161
162
163
# File 'lib/derivative_rodeo/generators/base_generator.rb', line 160

def generated_uris
  # TODO: what do we do about nils?
  generated_files.map { |file| file&.file_uri }
end

#input_filesArray<StorageLocations::BaseLocation>

Returns:



201
202
203
204
205
# File 'lib/derivative_rodeo/generators/base_generator.rb', line 201

def input_files
  @input_files ||= input_uris.map do |file_uri|
    DerivativeRodeo::StorageLocations::BaseLocation.from_uri(file_uri)
  end
end

#run(command) ⇒ String

A bit of indirection to create a common interface for running a shell command.

Parameters:

  • command (String)

Returns:

  • (String)


302
303
304
305
306
307
308
309
# File 'lib/derivative_rodeo/generators/base_generator.rb', line 302

def run(command)
  logger.debug "* Start command: #{command}"
  # TODO: What kind of error handling do we want?
  result = `#{command}`
  logger.debug "* Result: \n*  #{result.gsub("\n", "\n*  ")}"
  logger.debug "* End  command: #{command}"
  result
end

#valid_instantiation?Boolean

This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.

Returns:

  • (Boolean)


85
86
87
88
89
90
# File 'lib/derivative_rodeo/generators/base_generator.rb', line 85

def valid_instantiation?
  # TODO: Does this even make sense.
  # When we have a BaseGenerator and not one of it's children or when we've assigned the
  # output_extension.  instance_of? is more specific than is_a?
  instance_of?(DerivativeRodeo::Generators::BaseGenerator) || output_extension
end

#with_each_requisite_location_and_tmp_file_path {|input_location, tmp_file_path| ... } ⇒ Object

The files that are required as part of the #generated_files (though more precisely the #build_step.)

This method is responsible for one thing:

This method allows child classes to modify the file_uris for example, to filter out files that are not of the correct type or as a means of having “this” generator depend on another generator. The HocrGenerator requires that the input_location be a monochrome; so it does conversions of each given input_location. The PdfSplitGenerator uses this method to take each given PDF and generated one image per page of each given PDF. Those images are then treated as the requisite locations.

Yield Parameters:

  • input_location (StorageLocations::BaseLocations)

    the from location as represented by a URI.

  • tmp_file_path (String)

    where to find the input_location’s file in the processing tmp space.

See Also:



191
192
193
194
195
196
197
# File 'lib/derivative_rodeo/generators/base_generator.rb', line 191

def with_each_requisite_location_and_tmp_file_path
  input_files.each do |input_location|
    input_location.with_existing_tmp_path do |tmp_file_path|
      yield(input_location, tmp_file_path)
    end
  end
end