Module: DS::Util
- Extended by:
- Strings
- Included in:
- Recon::SourceEnumerator, Recon::Type::AllSubjects, Recon::Type::Genres, Recon::Type::Languages, Recon::Type::Materials, Recon::Type::NamedSubjects, Recon::Type::Names, Recon::Type::Places, Recon::Type::Splits, Recon::Type::Subjects, Recon::Type::Titles
- Defined in:
- lib/ds/util.rb,
lib/ds/util/cache.rb,
lib/ds/util/strings.rb,
lib/ds/util/csv_writer.rb,
lib/ds/util/csv_validator.rb
Defined Under Namespace
Modules: Strings Classes: CSVWriter, Cache, CsvValidator
Constant Summary
Constants included from Strings
Strings::ABBREV_REGEX, Strings::ELLIPSIS_REGEX, Strings::FINAL_QUESTION_REGEX, Strings::TERMINAL_PUNCT_REGEX
Instance Method Summary collapse
-
#process_xml(files, remove_namespaces: false) {|xml, data| ... } ⇒ Object
Open and parse each XML file in
files, optionally stripping namespaces from the parsed XML, running block on each XML document:.
Methods included from Strings
clean_string, clean_white_space, convert_mets_superscript, escape_pipes, fix_double_periods, is_url?, normalize_string, remove_brackets, terminate, unicode_normalize
Instance Method Details
#process_xml(files, remove_namespaces: false) {|xml, data| ... } ⇒ Object
Open and parse each XML file in files, optionally stripping namespaces
from the parsed XML, running block on each XML document:
data = []
process_xml files, remove_namespaces: true do |xml|
data << xml.xpath('//some/path/text()').text
end
28 29 30 31 32 33 34 35 |
# File 'lib/ds/util.rb', line 28 def process_xml files, remove_namespaces: false, &block files.each do |in_xml| # may be reading file list from STDIN; remove any trailing \r or \n xml = File.open(in_xml.chomp) { |f| Nokogiri::XML f } xml.remove_namespaces! if remove_namespaces yield xml end end |