Module: Recon
- Included in:
- DS::CLI
- Defined in:
- lib/ds/recon.rb,
lib/ds/recon/constants.rb,
lib/ds/recon/recon_data.rb,
lib/ds/recon/type/names.rb,
lib/ds/recon/url_lookup.rb,
lib/ds/recon/type/genres.rb,
lib/ds/recon/type/places.rb,
lib/ds/recon/type/splits.rb,
lib/ds/recon/type/titles.rb,
lib/ds/recon/recon_builder.rb,
lib/ds/recon/recon_manager.rb,
lib/ds/recon/type/subjects.rb,
lib/ds/recon/type/languages.rb,
lib/ds/recon/type/materials.rb,
lib/ds/recon/type/recon_type.rb,
lib/ds/recon/ds_csv_enumerator.rb,
lib/ds/recon/source_enumerator.rb,
lib/ds/recon/type/all_subjects.rb,
lib/ds/recon/tei_xml_enumerator.rb,
lib/ds/recon/marc_xml_enumerator.rb,
lib/ds/recon/type/named_subjects.rb,
lib/ds/recon/ds_mets_xml_enumerator.rb
Overview
The DS Recon module contains classes and methods for working with DS recon data dictionaries.
The classes in this module manage and support all the following:
-
The loading recon data dictionary CSV files for recon lookups
-
The generation of recon CSV files from import sources
-
The addition of recon data to import CSVs
The key modules and classes in the Recon module are:
-
Recon – validation and loading of recon data dictionary CSVs; data dictionary lookups; retrieval and updates of the DS data git repository, which includes the data dictionary CSVs
-
Type – recon type configurations used for lookups, extractions, and column mappings
-
ReconManager – the main interface for the Recon module; used to build and write recon CSVs
-
ReconBuilder – used by the Recon::Manager to build recon values hashes by extracting DS::Extractor::BaseTerm instances from source records and performing lookups
-
SourceEnumerator instances – used by Recon::ReconBuilder to iterate over source records
Defined Under Namespace
Modules: Constants, ReconData, Type Classes: DsCsvEnumerator, DsMetsXmlEnumerator, MarcXmlEnumerator, ReconBuilder, ReconManager, SourceEnumerator, TeiXmlEnumerator, URLLookup
Constant Summary collapse
- ERROR_UNBALANCED_SUBFIELDS =
'Row has unmatched subfields'- ERROR_BLANK_SUBFIELDS =
'Row has blank subfields'- ERROR_MISSING_REQUIRED_COLUMNS =
"CSV is missing required column(s)"- ERROR_CSV_FILE_NOT_FOUND =
'Recon CSV file cannot be found'- RECON_SETS =
%i{ genres languages materials named-subjects names places subjects titles }
- RECON_TYPES_MAP =
{ :genres => Recon::Type::Genres, :languages => Recon::Type::Languages, :materials => Recon::Type::Materials, :'all-subjects' => Recon::Type::Subjects, :'named-subjects' => Recon::Type::NamedSubjects, :names => Recon::Type::Names, :places => Recon::Type::Places, :subjects => Recon::Type::Subjects, :titles => Recon::Type::Titles, :splits => Recon::Type::Splits }.freeze
- RECON_TYPES =
RECON_TYPES_MAP.values.freeze
- RECON_VALIDATION_SETS =
RECON_TYPES.map(&:set_name).freeze
Class Method Summary collapse
-
.add_alt_keys(data) ⇒ String
Builds an alt key from key, splitting it into an array of values, invoking DS::Util::clean_string on each value and rejoining the cleaned values separated by ‘$$’.
-
.build_alt_key(key) ⇒ String
Builds an alt key from key, splitting it into an array of values, invoking DS::Util::clean_string on each value and rejoining the cleaned values separated by ‘$$’.
-
.build_key(values) ⇒ String
Builds a key by concatenating the normalized Unicode representation of
values, separated by ‘$$’, and converts it to lowercase. -
.csv_files(set_name) ⇒ Array<String>
Returns an array of paths to the CSV files for the given set name.
-
.find_recon_type(set_name) ⇒ Recon::Type::ReconType?
Finds the reconciliation type configuration for the given set name.
-
.find_set(set_name) ⇒ Hash<String, Struct>?
Return the set of terms for the given set name.
-
.find_set_config(name) ⇒ Hash<String, Object>
Finds the
config/settings.ymlconfiguration for the given set name. -
.git_repo ⇒ String
The path to the DS data git repository.
-
.load_set(set_name) ⇒ Hash<String, Struct>
Return and return the reconciliation data dictionary for the given set name.
-
.lookup_single(set_name, key_values:, column:) ⇒ Object?
For the recon data dictionary with
set_name, find the value in thecolumnwith the key values.join(‘$$’). -
.read_csv(csv_file:, recon_type:, data:) ⇒ Hash<String, Struct>
Read one CSV file and add the reconciliation data to the data hash and return the updated data hash.
-
.validate(set_name, csv_file) ⇒ Array<String>
Validate an input or output recon CSV file for the given set name.
-
.validate!(set_name, csv_file) ⇒ void
Invoke Recon.validate for the input or output recon CSV file and raise an exception if there is an error.
-
.validate_row(recon_type, row, row_num) ⇒ Array<String>
Validate one row of a CSV file for required column headers and values and for balanced columns.
Class Method Details
.add_alt_keys(data) ⇒ String
Builds an alt key from key, splitting it into an array of values, invoking DS::Util::clean_string on each value and rejoining the cleaned values separated by ‘$$’.
263 264 265 266 267 268 269 |
# File 'lib/ds/recon.rb', line 263 def self.add_alt_keys data data.keys.each do |key| alt_key = build_alt_key key next if data.include? alt_key data[alt_key] = data[key] end end |
.build_alt_key(key) ⇒ String
Builds an alt key from key, splitting it into an array of values, invoking DS::Util::clean_string on each value and rejoining the cleaned values separated by ‘$$’.
277 278 279 280 281 |
# File 'lib/ds/recon.rb', line 277 def self.build_alt_key key key.split('$$').map { |v| DS::Util.clean_string v, terminator: '' }.join '$$' end |
.build_key(values) ⇒ String
Builds a key by concatenating the normalized Unicode representation of values, separated by ‘$$’, and converts it to lowercase.
289 290 291 |
# File 'lib/ds/recon.rb', line 289 def self.build_key values DS::Util.unicode_normalize values.select(&:present?).join('$$').downcase end |
.csv_files(set_name) ⇒ Array<String>
Returns an array of paths to the CSV files for the given set name
152 153 154 155 156 |
# File 'lib/ds/recon.rb', line 152 def self.csv_files set_name set_config = find_set_config set_name repo_paths = [set_config['repo_path']].flatten # ensure repo_path is an array repo_paths.map { |path| File.join Recon.git_repo, path } end |
.find_recon_type(set_name) ⇒ Recon::Type::ReconType?
Finds the reconciliation type configuration for the given set name.
142 143 144 145 146 |
# File 'lib/ds/recon.rb', line 142 def self.find_recon_type set_name return RECON_TYPES_MAP[set_name.to_sym] if RECON_TYPES_MAP.key? set_name.to_sym raise "Unknown recon set_name: #{set_name.inspect}" end |
.find_set(set_name) ⇒ Hash<String, Struct>?
Return the set of terms for the given set name.
115 116 117 118 |
# File 'lib/ds/recon.rb', line 115 def self.find_set set_name @@reconciliations ||= {} @@reconciliations[set_name] ||= load_set set_name end |
.find_set_config(name) ⇒ Hash<String, Object>
Finds the config/settings.yml configuration for the given set name.
132 133 134 135 136 |
# File 'lib/ds/recon.rb', line 132 def self.find_set_config name config = Settings.recon.sets.find { |s| s.name == name } raise DSError, "Unknown set name: #{name.inspect}" unless config config end |
.git_repo ⇒ String
The path to the DS data git repository
123 124 125 |
# File 'lib/ds/recon.rb', line 123 def self.git_repo File.join Settings.recon.local_dir, Settings.recon.git_local_name end |
.load_set(set_name) ⇒ Hash<String, Struct>
Return and return the reconciliation data dictionary for the given set name.
The hash keys are the concatenated key values for the reconciliation type (e.g., Recon::Type::Names.get_key_values)
164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 |
# File 'lib/ds/recon.rb', line 164 def self.load_set set_name set_config = find_set_config set_name recon_type = find_recon_type set_name raise "No configured set found for: '#{set_name}'" unless set_config data = {} params = { recon_type: recon_type, data: data } # Path may be a single value or an array. Make sure it's an array. csv_files(set_name).each do |csv_file| params[:csv_file] = csv_file validate! set_name, params[:csv_file] read_csv **params end add_alt_keys data data end |
.lookup_single(set_name, key_values:, column:) ⇒ Object?
For the recon data dictionary with set_name, find the value in the column with the key values.join(‘$$’).
101 102 103 104 105 106 107 108 109 |
# File 'lib/ds/recon.rb', line 101 def self.lookup_single set_name, key_values:, column: recon_set = find_set set_name key = build_key key_values return recon_set.dig key, column if recon_set.include? key # try a key with a "cleaned" string alt_key = build_alt_key key recon_set.dig(alt_key, column) end |
.read_csv(csv_file:, recon_type:, data:) ⇒ Hash<String, Struct>
Read one CSV file and add the reconciliation data to the data hash and return the updated data hash.
193 194 195 196 197 198 199 200 201 202 |
# File 'lib/ds/recon.rb', line 193 def self.read_csv csv_file:, recon_type:, data: CSV.foreach csv_file, headers: true do |row| row = row.to_h.symbolize_keys next if recon_type.lookup_values(row).blank? struct = OpenStruct.new row.to_h key = build_key recon_type.get_key_values row data[key] = struct end data end |
.validate(set_name, csv_file) ⇒ Array<String>
Validate an input or output recon CSV file for the given set name.
Validates each row the CSV file for required column headers and values and for balanced columns. Required headers and values are defined in the Recon::Type::ReconType class, by Recon::Type::ReconType.recon_csv_headers and Recon::Type::ReconType.balanced_headers.
215 216 217 218 219 220 221 222 223 224 225 226 |
# File 'lib/ds/recon.rb', line 215 def self.validate set_name, csv_file return unless RECON_VALIDATION_SETS.include? set_name return "#{ERROR_CSV_FILE_NOT_FOUND}: '#{csv_file}'" unless File.exist? csv_file recon_type = Recon.find_recon_type set_name row_num = 0 CSV.readlines(csv_file, headers: true).map(&:to_h).filter_map { |row| row.symbolize_keys! error = validate_row recon_type, row, row_num+=1 error if error.present? } end |
.validate!(set_name, csv_file) ⇒ void
This method returns an undefined value.
Invoke validate for the input or output recon CSV file and raise an exception if there is an error.
235 236 237 238 239 240 |
# File 'lib/ds/recon.rb', line 235 def self.validate! set_name, csv_file error = validate set_name, csv_file return unless error.present? raise DSError, "Error validating #{set_name} recon CSV #{csv_file}:\n#{error}" end |
.validate_row(recon_type, row, row_num) ⇒ Array<String>
Validate one row of a CSV file for required column headers and values and for balanced columns.
249 250 251 252 253 254 255 |
# File 'lib/ds/recon.rb', line 249 def self.validate_row recon_type, row, row_num errors = DS::Util::CsvValidator.validate_required_columns(row, required_columns: recon_type.recon_csv_headers, row_num: row_num) raise DSError.new errors.join("\n") unless errors.blank? DS::Util::CsvValidator.validate_balanced_columns( row, balanced_columns: recon_type.balanced_columns, row_num: row_num ) end |