Module: DS::Util

Extended by:
Strings
Included in:
Recon::SourceEnumerator, Recon::Type::AllSubjects, Recon::Type::Genres, Recon::Type::Languages, Recon::Type::Materials, Recon::Type::NamedSubjects, Recon::Type::Names, Recon::Type::Places, Recon::Type::Splits, Recon::Type::Subjects, Recon::Type::Titles
Defined in:
lib/ds/util.rb,
lib/ds/util/cache.rb,
lib/ds/util/strings.rb,
lib/ds/util/csv_writer.rb,
lib/ds/util/csv_validator.rb

Defined Under Namespace

Modules: Strings Classes: CSVWriter, Cache, CsvValidator

Constant Summary

Constants included from Strings

Strings::ABBREV_REGEX, Strings::ELLIPSIS_REGEX, Strings::FINAL_QUESTION_REGEX, Strings::TERMINAL_PUNCT_REGEX

Instance Method Summary collapse

Methods included from Strings

clean_string, clean_white_space, convert_mets_superscript, escape_pipes, fix_double_periods, is_url?, normalize_string, remove_brackets, terminate, unicode_normalize

Instance Method Details

#process_xml(files, remove_namespaces: false) {|xml, data| ... } ⇒ Object

Open and parse each XML file in files, optionally stripping namespaces from the parsed XML, running block on each XML document:

data = []
process_xml files, remove_namespaces: true do |xml|
data << xml.xpath('//some/path/text()').text
end

Parameters:

  • files (Enumerable<String>)

    XML files to process

  • remove_namespaces (Boolean) (defaults to: false)

    whether strip namespaces from parsed XML

Yields:

  • (xml, data)

    yields a Nokogiri XML document and the array of data to populate the CSV; you must know the format of each item in the ++data++ array

Yield Parameters:

  • xml (Nokogiri::XML::Document)

    the parsed document



28
29
30
31
32
33
34
35
# File 'lib/ds/util.rb', line 28

def process_xml files, remove_namespaces: false, &block
  files.each do |in_xml|
    # may be reading file list from STDIN; remove any trailing \r or \n
    xml = File.open(in_xml.chomp) { |f| Nokogiri::XML f }
    xml.remove_namespaces! if remove_namespaces
    yield xml
  end
end