Module: Recon

Included in:
DS::CLI
Defined in:
lib/ds/recon.rb,
lib/ds/recon/constants.rb,
lib/ds/recon/recon_data.rb,
lib/ds/recon/type/names.rb,
lib/ds/recon/url_lookup.rb,
lib/ds/recon/type/genres.rb,
lib/ds/recon/type/places.rb,
lib/ds/recon/type/splits.rb,
lib/ds/recon/type/titles.rb,
lib/ds/recon/recon_builder.rb,
lib/ds/recon/recon_manager.rb,
lib/ds/recon/type/subjects.rb,
lib/ds/recon/type/languages.rb,
lib/ds/recon/type/materials.rb,
lib/ds/recon/type/recon_type.rb,
lib/ds/recon/ds_csv_enumerator.rb,
lib/ds/recon/source_enumerator.rb,
lib/ds/recon/type/all_subjects.rb,
lib/ds/recon/tei_xml_enumerator.rb,
lib/ds/recon/marc_xml_enumerator.rb,
lib/ds/recon/type/named_subjects.rb,
lib/ds/recon/ds_mets_xml_enumerator.rb

Overview

The DS Recon module contains classes and methods for working with DS recon data dictionaries.

The classes in this module manage and support all the following:

  • The loading recon data dictionary CSV files for recon lookups

  • The generation of recon CSV files from import sources

  • The addition of recon data to import CSVs

The key modules and classes in the Recon module are:

  • Recon – validation and loading of recon data dictionary CSVs; data dictionary lookups; retrieval and updates of the DS data git repository, which includes the data dictionary CSVs

  • Type – recon type configurations used for lookups, extractions, and column mappings

  • ReconManager – the main interface for the Recon module; used to build and write recon CSVs

  • ReconBuilder – used by the Recon::Manager to build recon values hashes by extracting DS::Extractor::BaseTerm instances from source records and performing lookups

  • SourceEnumerator instances – used by Recon::ReconBuilder to iterate over source records

Examples:

require 'ds'
# write the places.csv file for a set of MARC XML files
files = Dir['source/files/*.xml']
recon_manager = Recon::ReconManager.new(
  source_type: 'marc-xml',
  out_dir: 'path/to/dir',
  files: files
)
recon_type = Recon.find_recon_type :places
recon_manager.write_csv recon_type # => 'path/to/dir/places.csv'

Defined Under Namespace

Modules: Constants, ReconData, Type Classes: DsCsvEnumerator, DsMetsXmlEnumerator, MarcXmlEnumerator, ReconBuilder, ReconManager, SourceEnumerator, TeiXmlEnumerator, URLLookup

Constant Summary collapse

ERROR_UNBALANCED_SUBFIELDS =
'Row has unmatched subfields'
ERROR_BLANK_SUBFIELDS =
'Row has blank subfields'
ERROR_MISSING_REQUIRED_COLUMNS =
"CSV is missing required column(s)"
ERROR_CSV_FILE_NOT_FOUND =
'Recon CSV file cannot be found'
RECON_SETS =
%i{
  genres
  languages
  materials
  named-subjects
  names
  places
  subjects
  titles
}
RECON_TYPES_MAP =
{
  :genres => Recon::Type::Genres,
  :languages => Recon::Type::Languages,
  :materials => Recon::Type::Materials,
  :'all-subjects' => Recon::Type::Subjects,
  :'named-subjects' => Recon::Type::NamedSubjects,
  :names => Recon::Type::Names,
  :places => Recon::Type::Places,
  :subjects => Recon::Type::Subjects,
  :titles => Recon::Type::Titles,
  :splits => Recon::Type::Splits
}.freeze
RECON_TYPES =
RECON_TYPES_MAP.values.freeze
RECON_VALIDATION_SETS =
RECON_TYPES.map(&:set_name).freeze

Class Method Summary collapse

Class Method Details

.add_alt_keys(data) ⇒ String

Builds an alt key from key, splitting it into an array of values, invoking DS::Util::clean_string on each value and rejoining the cleaned values separated by ‘$$’.

Parameters:

  • key (String)

    the key to be included in the alt key

Returns:

  • (String)

    the built alt key



263
264
265
266
267
268
269
# File 'lib/ds/recon.rb', line 263

def self.add_alt_keys data
  data.keys.each do |key|
    alt_key = build_alt_key key
    next if data.include? alt_key
    data[alt_key] = data[key]
  end
end

.build_alt_key(key) ⇒ String

Builds an alt key from key, splitting it into an array of values, invoking DS::Util::clean_string on each value and rejoining the cleaned values separated by ‘$$’.

Parameters:

  • key (String)

    the key to be included in the alt key

Returns:

  • (String)

    the built alt key



277
278
279
280
281
# File 'lib/ds/recon.rb', line 277

def self.build_alt_key key
  key.split('$$').map { |v|
    DS::Util.clean_string v, terminator: ''
  }.join '$$'
end

.build_key(values) ⇒ String

Builds a key by concatenating the normalized Unicode representation of values, separated by ‘$$’, and converts it to lowercase.

Parameters:

  • values (Array<String>)

    the values to be included in the key

  • subset (String)

    the subset to be included in the key

Returns:

  • (String)

    the built key



289
290
291
# File 'lib/ds/recon.rb', line 289

def self.build_key values
  DS::Util.unicode_normalize values.select(&:present?).join('$$').downcase
end

.csv_files(set_name) ⇒ Array<String>

Returns an array of paths to the CSV files for the given set name

Parameters:

  • set_name (String)

    the name of the set

Returns:

  • (Array<String>)

    an array of paths to the CSV files



152
153
154
155
156
# File 'lib/ds/recon.rb', line 152

def self.csv_files set_name
  set_config = find_set_config set_name
  repo_paths = [set_config['repo_path']].flatten # ensure repo_path is an array
  repo_paths.map { |path| File.join Recon.git_repo, path }
end

.find_recon_type(set_name) ⇒ Recon::Type::ReconType?

Finds the reconciliation type configuration for the given set name.

Parameters:

  • set_name (String)

    the name of the set

Returns:



142
143
144
145
146
# File 'lib/ds/recon.rb', line 142

def self.find_recon_type set_name
  return RECON_TYPES_MAP[set_name.to_sym] if RECON_TYPES_MAP.key? set_name.to_sym

  raise "Unknown recon set_name: #{set_name.inspect}"
end

.find_set(set_name) ⇒ Hash<String, Struct>?

Return the set of terms for the given set name.

Parameters:

  • set_name (String)

    the name of the set

Returns:

  • (Hash<String, Struct>, nil)

    the set of terms, with keys as values and term structs as values



115
116
117
118
# File 'lib/ds/recon.rb', line 115

def self.find_set set_name
  @@reconciliations ||= {}
  @@reconciliations[set_name] ||= load_set set_name
end

.find_set_config(name) ⇒ Hash<String, Object>

Finds the config/settings.yml configuration for the given set name.

Parameters:

  • set_name (String)

    the name of the set

Returns:

  • (Hash<String, Object>)

    the configuration for the set name, or nil if not found

Raises:

  • (DSError)

    if the set name is not found



132
133
134
135
136
# File 'lib/ds/recon.rb', line 132

def self.find_set_config name
  config = Settings.recon.sets.find { |s| s.name == name }
  raise DSError, "Unknown set name: #{name.inspect}" unless config
  config
end

.git_repoString

The path to the DS data git repository

Returns:

  • (String)

    the path to the DS data git repository



123
124
125
# File 'lib/ds/recon.rb', line 123

def self.git_repo
  File.join Settings.recon.local_dir, Settings.recon.git_local_name
end

.load_set(set_name) ⇒ Hash<String, Struct>

Return and return the reconciliation data dictionary for the given set name.

The hash keys are the concatenated key values for the reconciliation type (e.g., Recon::Type::Names.get_key_values)

Parameters:

  • set_name (String)

    the name of the set

Returns:

  • (Hash<String, Struct>)

    the reconciliation data



164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
# File 'lib/ds/recon.rb', line 164

def self.load_set set_name
  set_config = find_set_config set_name
  recon_type = find_recon_type set_name
  raise "No configured set found for: '#{set_name}'" unless set_config

  data = {}
  params = {
    recon_type:    recon_type,
    data:          data
  }

  # Path may be a single value or an array. Make sure it's an array.
  csv_files(set_name).each do |csv_file|
    params[:csv_file] = csv_file
    validate! set_name, params[:csv_file]
    read_csv **params
  end

  add_alt_keys data
  data
end

.lookup_single(set_name, key_values:, column:) ⇒ Object?

For the recon data dictionary with set_name, find the value in the column with the key values.join(‘$$’).

Parameters:

  • set_name (String)

    the name of the set to look up

  • values (String)

    the lookup key values

  • column (Symbol)

    the column value to retrieve

Returns:

  • (Object, nil)

    the value found in the specified column, or nil if not found



101
102
103
104
105
106
107
108
109
# File 'lib/ds/recon.rb', line 101

def self.lookup_single set_name, key_values:, column:
  recon_set = find_set set_name
  key = build_key key_values
  return recon_set.dig key, column if recon_set.include? key

  # try a key with a "cleaned" string
  alt_key = build_alt_key key
  recon_set.dig(alt_key, column)
end

.read_csv(csv_file:, recon_type:, data:) ⇒ Hash<String, Struct>

Read one CSV file and add the reconciliation data to the data hash and return the updated data hash.

Parameters:

  • csv_file (String)

    the path to the CSV file

  • recon_type (Recon::Type::ReconType)

    the reconciliation type

  • data (Hash<String, Struct>)

    the reconciliation data

Returns:

  • (Hash<String, Struct>)

    the updated reconciliation data



193
194
195
196
197
198
199
200
201
202
# File 'lib/ds/recon.rb', line 193

def self.read_csv csv_file:, recon_type:, data:
  CSV.foreach csv_file, headers: true do |row|
    row = row.to_h.symbolize_keys
    next if recon_type.lookup_values(row).blank?
    struct    = OpenStruct.new row.to_h
    key       = build_key recon_type.get_key_values row
    data[key] = struct
  end
  data
end

.validate(set_name, csv_file) ⇒ Array<String>

Validate an input or output recon CSV file for the given set name.

Validates each row the CSV file for required column headers and values and for balanced columns. Required headers and values are defined in the Recon::Type::ReconType class, by Recon::Type::ReconType.recon_csv_headers and Recon::Type::ReconType.balanced_headers.

Parameters:

  • set_name (String)

    the name of the set

  • csv_file (String)

    the path to the CSV file

Returns:

  • (Array<String>)

    an array of errors



215
216
217
218
219
220
221
222
223
224
225
226
# File 'lib/ds/recon.rb', line 215

def self.validate set_name, csv_file
  return unless RECON_VALIDATION_SETS.include? set_name
  return "#{ERROR_CSV_FILE_NOT_FOUND}: '#{csv_file}'" unless File.exist? csv_file

  recon_type = Recon.find_recon_type set_name
  row_num    = 0
  CSV.readlines(csv_file, headers: true).map(&:to_h).filter_map { |row|
    row.symbolize_keys!
    error = validate_row recon_type, row, row_num+=1
    error if error.present?
  }
end

.validate!(set_name, csv_file) ⇒ void

This method returns an undefined value.

Invoke validate for the input or output recon CSV file and raise an exception if there is an error.

Parameters:

  • set_name (String)

    the name of the set

  • csv_file (String)

    the path to the CSV file

Raises:

  • (DSError)

    if there is an error



235
236
237
238
239
240
# File 'lib/ds/recon.rb', line 235

def self.validate! set_name, csv_file
  error = validate set_name, csv_file
  return unless error.present?

  raise DSError, "Error validating #{set_name} recon CSV #{csv_file}:\n#{error}"
end

.validate_row(recon_type, row, row_num) ⇒ Array<String>

Validate one row of a CSV file for required column headers and values and for balanced columns.

Parameters:

  • recon_type (Recon::Type::ReconType)

    the reconciliation type

  • row (Hash)

    the row of data

  • row_num (Integer)

    the row number used in error messages

Returns:

  • (Array<String>)

    an array of errors



249
250
251
252
253
254
255
# File 'lib/ds/recon.rb', line 249

def self.validate_row recon_type, row, row_num
  errors = DS::Util::CsvValidator.validate_required_columns(row, required_columns: recon_type.recon_csv_headers, row_num: row_num)
  raise DSError.new errors.join("\n") unless errors.blank?
  DS::Util::CsvValidator.validate_balanced_columns(
    row, balanced_columns: recon_type.balanced_columns, row_num: row_num
  )
end