Class: Modsulator
- Inherits:
-
Object
- Object
- Modsulator
- Defined in:
- lib/modsulator.rb
Overview
The main class for the MODSulator API, which lets you work with metadata spreadsheets and MODS XML.
Constant Summary collapse
- NAMESPACE =
We define our own namespace for <xmlDocs>
"http://library.stanford.edu/xmlDocs"
Instance Attribute Summary collapse
-
#file ⇒ Object
readonly
Returns the value of attribute file.
-
#rows ⇒ Object
readonly
Returns the value of attribute rows.
-
#template_xml ⇒ Object
readonly
Returns the value of attribute template_xml.
Instance Method Summary collapse
-
#convert_rows ⇒ String
Generates an XML document with one <mods> entry per input row.
-
#generate_normalized_mods(output_directory) ⇒ Void
Generates normalized (Stanford) MODS XML, writing output to files.
-
#generate_xml(metadata_row) ⇒ String
Generates an XML string for a given row in a spreadsheet.
-
#initialize(file, filename, options = {}) ⇒ Modsulator
constructor
The reason for requiring both a file and filename is that within the API that is one of the users of this class, the file and filename exist separately.
-
#row_to_xml(row) ⇒ Object
Converts a single data row into a normalized MODS XML document.
-
#validate_headers(spreadsheet_headers) ⇒ Array<String>
Checks that all the headers in the spreadsheet has a corresponding entry in the XML template.
Constructor Details
#initialize(file, filename, options = {}) ⇒ Modsulator
The reason for requiring both a file and filename is that within the API that is one of the users of this class, the file and filename exist separately. Note that if neither :template_file nor :template_string are specified, the gem’s built-in XML template is used.
31 32 33 34 35 36 37 38 39 40 41 42 43 44 |
# File 'lib/modsulator.rb', line 31 def initialize file, filename, = {} @file = file @filename = filename @rows = ModsulatorSheet.new(@file, @filename).rows if [:template_string] @template_xml = [:template_string] elsif [:template_file] @template_xml = File.read([:template_file]) else @template_xml = File.read(File.("../modsulator/modsulator_template.xml", __FILE__)) end end |
Instance Attribute Details
#file ⇒ Object (readonly)
Returns the value of attribute file.
20 21 22 |
# File 'lib/modsulator.rb', line 20 def file @file end |
#rows ⇒ Object (readonly)
Returns the value of attribute rows.
20 21 22 |
# File 'lib/modsulator.rb', line 20 def rows @rows end |
#template_xml ⇒ Object (readonly)
Returns the value of attribute template_xml.
20 21 22 |
# File 'lib/modsulator.rb', line 20 def template_xml @template_xml end |
Instance Method Details
#convert_rows ⇒ String
Generates an XML document with one <mods> entry per input row. Example output:
<xmlDocs datetime="2015-03-23 09:22:11AM" sourceFile="FitchMLK-v1.xlsx">
<xmlDoc id="descMetadata" objectId="druid:aa111aa1111">
<mods ... >
:
</mods>
</xmlDoc>
<xmlDoc id="descMetadata" objectId="druid:aa222aa2222">
<mods ... >
:
</mods>
</xmlDoc>
</xmlDocs>
62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 |
# File 'lib/modsulator.rb', line 62 def convert_rows() time_stamp = Time.now.strftime("%Y-%m-%d %I:%M:%S%p") header = "<xmlDocs xmlns=\"#{NAMESPACE}\" datetime=\"#{time_stamp}\" sourceFile=\"#{@filename}\">" full_doc = Nokogiri::XML(header) root = full_doc.root @rows.each do |row| mods_xml_doc = row_to_xml(row) sub_doc = full_doc.create_element('xmlDoc', :id => 'descMetadata', :objectId => "#{row['druid']}") sub_doc.add_child(mods_xml_doc.root) root.add_child(sub_doc) end full_doc.to_s end |
#generate_normalized_mods(output_directory) ⇒ Void
Generates normalized (Stanford) MODS XML, writing output to files.
115 116 117 118 119 120 121 122 123 124 |
# File 'lib/modsulator.rb', line 115 def generate_normalized_mods(output_directory) # Write one XML file per data row in the input spreadsheet rows.each do |row| sourceid = row['sourceId'] output_filename = output_directory + "/" + sourceid + ".xml" mods_doc = row_to_xml(row) File.open(output_filename, 'w') { |fh| fh.puts(mods_doc.root.to_s) } end end |
#generate_xml(metadata_row) ⇒ String
Generates an XML string for a given row in a spreadsheet.
84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 |
# File 'lib/modsulator.rb', line 84 def generate_xml() manifest_row = # XML escape all of the entries in the manifest row so they won't break the XML manifest_row.each {|k,v| manifest_row[k]=Nokogiri::XML::Text.new(v.to_s,Nokogiri::XML('')).to_s if v } # Enable access with symbol or string keys manifest_row = manifest_row.with_indifferent_access # Run the XML template through ERB. This creates a new ERB object from the template XML, # NOT creating a separate thread, and omitting newlines for lines ending with '%>' template = ERB.new(template_xml, nil, '>') # ERB.result() actually computes the template. This just passes the top level binding. = template.result(binding) # The manifest_row is a hash, with column names as the key. # In the template, as a convenience we allow users to put specific column placeholders inside # double brackets: "blah [[column_name]] blah". # Here we replace those placeholders with the corresponding value # from the manifest row. manifest_row.each { |k,v| .gsub! "[[#{k}]]", v.to_s.strip } end |
#row_to_xml(row) ⇒ Object
Converts a single data row into a normalized MODS XML document.
143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 |
# File 'lib/modsulator.rb', line 143 def row_to_xml(row) # Generate an XML string, then remove any text carried over from the template mods_xml = generate_xml(row) mods_xml.gsub!(/\[\[[^\]]+\]\]/, "") # Remove empty tags from when e.g. <[[sn1:p2:type]]> does not get filled in when [[sn1:p2:type]] has no value in the source spreadsheet mods_xml.gsub!(/<\s[^>]+><\/>/, "") mods_xml_doc = Nokogiri::XML(mods_xml) normalizer = Normalizer.new normalizer.normalize_document(mods_xml_doc.root) return mods_xml_doc end |
#validate_headers(spreadsheet_headers) ⇒ Array<String>
Checks that all the headers in the spreadsheet has a corresponding entry in the XML template.
132 133 134 135 136 |
# File 'lib/modsulator.rb', line 132 def validate_headers(spreadsheet_headers) spreadsheet_headers.reject do |header| header.nil? || header == "sourceId" || template_xml.include?(header) end end |