Class: Stanford::ContentInventory
- Inherits:
-
Object
- Object
- Stanford::ContentInventory
- Defined in:
- lib/stanford/content_inventory.rb
Overview
Copyright © 2012 by The Board of Trustees of the Leland Stanford Junior University. All rights reserved. See LICENSE for details.
Stanford-specific utility methods for transforming contentMetadata to versionInventory and doing
Data Model
-
DorMetadata = utility methods for interfacing with Stanford metadata files (esp contentMetadata)
-
ContentInventory [1..1] = utilities for transforming contentMetadata to versionInventory and doing comparsions
-
ActiveFedoraObject [1..*] = utility for extracting content or other information from a Fedora Instance
-
Instance Method Summary collapse
-
#generate_content_metadata(file_group, object_id, version_id) ⇒ String
The contentMetadata instance generated from the FileGroup.
-
#generate_instance(node) ⇒ FileInstance
The Moab::FileInstance object generated from the XML data.
-
#generate_signature(node) ⇒ FileSignature
The Moab::FileSignature object generated from the XML data.
-
#group_from_cm(content_metadata, subset) ⇒ FileGroup
The Moab::FileGroup object generated from a contentMetadata instance.
-
#inventory_from_cm(content_metadata, object_id, subset, version_id = nil) ⇒ FileInventory
The versionInventory equivalent of the contentMetadata if the supplied content_metadata is blank or empty, then a skeletal FileInventory will be returned.
-
#remediate_checksum_nodes(file_node, signature) ⇒ void
Update the file’s checksum elements if data missing, raise exception if inconsistent.
-
#remediate_content_metadata(content_metadata, content_group) ⇒ String
Returns a remediated copy of the contentMetadata with fixity data filled in.
-
#remediate_file_size(file_node, signature) ⇒ void
Update the file size attribute if missing, raise exception if inconsistent.
-
#validate_content_metadata(content_metadata) ⇒ Boolean
True if contentMetadata has essetial file attributes, else raise exception.
-
#validate_content_metadata_details(content_metadata) ⇒ Array<String>
List of problems found.
Instance Method Details
#generate_content_metadata(file_group, object_id, version_id) ⇒ String
Returns The contentMetadata instance generated from the FileGroup.
96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 |
# File 'lib/stanford/content_inventory.rb', line 96 def (file_group, object_id, version_id) cm = Nokogiri::XML::Builder.new do |xml| xml.contentMetadata(:type=>"sample", :objectId=>object_id) { xml.resource(:type=>"version", :sequence=>"1", :id=>"version-#{version_id.to_s}") { file_group.files.each do |file_manifestation| signature = file_manifestation.signature file_manifestation.instances.each do |instance| xml.file( :id=>instance.path, :size=>signature.size, :datetime=>instance.datetime, :shelve=>'yes', :publish=>'yes', :preserve=>'yes') { fixity = signature.fixity xml.checksum(:type=>"MD5") {xml.text signature.md5 } if fixity[:md5] xml.checksum(:type=>"SHA-1") {xml.text signature.sha1} if fixity[:sha1] xml.checksum(:type=>"SHA-256") {xml.text signature.sha256} if fixity[:sha256] } end end } } end cm.to_xml end |
#generate_instance(node) ⇒ FileInstance
Returns The Moab::FileInstance object generated from the XML data.
85 86 87 88 89 90 |
# File 'lib/stanford/content_inventory.rb', line 85 def generate_instance(node) instance = FileInstance.new() instance.path = node.attributes['id'].content instance.datetime = node.attributes['datetime'].content rescue nil instance end |
#generate_signature(node) ⇒ FileSignature
Returns The Moab::FileSignature object generated from the XML data.
65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 |
# File 'lib/stanford/content_inventory.rb', line 65 def generate_signature(node) signature = FileSignature.new() signature.size = node.attributes['size'].content checksum_nodes = node.xpath('checksum') checksum_nodes.each do |checksum_node| case checksum_node.attributes['type'].content.upcase when 'MD5' signature.md5 = checksum_node.text when 'SHA1', 'SHA-1' signature.sha1 = checksum_node.text when 'SHA256', 'SHA-256' signature.sha256 = checksum_node.text end end signature end |
#group_from_cm(content_metadata, subset) ⇒ FileGroup
Returns The Moab::FileGroup object generated from a contentMetadata instance.
38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 |
# File 'lib/stanford/content_inventory.rb', line 38 def group_from_cm(, subset) ng_doc = Nokogiri::XML() (ng_doc) nodeset = case subset.to_s.downcase when 'preserve' ng_doc.xpath("//file[@preserve='yes']") when 'publish' ng_doc.xpath("//file[@publish='yes']") when 'shelve' ng_doc.xpath("//file[@shelve='yes']") when 'all' ng_doc.xpath("//file") else raise "Unknown disposition subset (#{subset})" end content_group = FileGroup.new(:group_id=>'content', :data_source => "contentMetadata-#{subset}") nodeset.each do |file_node| signature = generate_signature(file_node) instance = generate_instance(file_node) content_group.add_file_instance(signature, instance) end content_group end |
#inventory_from_cm(content_metadata, object_id, subset, version_id = nil) ⇒ FileInventory
Returns The versionInventory equivalent of the contentMetadata if the supplied content_metadata is blank or empty, then a skeletal FileInventory will be returned.
22 23 24 25 26 27 28 29 30 31 |
# File 'lib/stanford/content_inventory.rb', line 22 def inventory_from_cm(, object_id, subset, version_id=nil) # The contentMetadata datastream is not required for ingest, since some object types, such as collection or APO do not require one. # Many of these objects have contentMetadata with no child elements, such as this: # <contentMetadata objectId="bd608mj3166" type="file"/> # but there are also objects that have no datasteam of this name at all cm_inventory = FileInventory.new(:type=>"version",:digital_object_id=>object_id, :version_id=>version_id) content_group = group_from_cm(, subset) cm_inventory.groups << content_group cm_inventory end |
#remediate_checksum_nodes(file_node, signature) ⇒ void
This method returns an undefined value.
Returns update the file’s checksum elements if data missing, raise exception if inconsistent.
206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 |
# File 'lib/stanford/content_inventory.rb', line 206 def remediate_checksum_nodes(file_node, signature) # collect <checksum> elements for checksum types that are already present checksum_nodes = OrderedHash.new file_node.xpath('checksum').each do |checksum_node| type = @type_for_name[checksum_node['type']] checksum_nodes[type] = checksum_node end # add new <checksum> elements for the other checksum types that were missing @names_for_type.each do |type, names| unless checksum_nodes.has_key?(type) checksum_node = Nokogiri::XML::Element.new('checksum',file_node.document) checksum_node['type'] = names[0] file_node << checksum_node checksum_nodes[type] = checksum_node end end # make sure the <checksum> element has a content value checksum_nodes.each do |type,checksum_node| cm_checksum = checksum_node.content sig_checksum = signature.checksums[type] if cm_checksum.nil? or cm_checksum.empty? checksum_node.content = sig_checksum elsif cm_checksum != sig_checksum raise "Inconsistent #{type.to_s} for #{file_node['id']}: #{cm_checksum} != #{sig_checksum}" end end end |
#remediate_content_metadata(content_metadata, content_group) ⇒ String
Returns a remediated copy of the contentMetadata with fixity data filled in
174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 |
# File 'lib/stanford/content_inventory.rb', line 174 def (, content_group) return nil if .nil? return if content_group.nil? or content_group.files.size < 1 signature_for_path = content_group.path_hash @type_for_name = FileSignature.checksum_type_for_name @names_for_type = FileSignature.checksum_names_for_type ng_doc = Nokogiri::XML() { |x| x.noblanks } nodeset = ng_doc.xpath("//file") nodeset.each do |file_node| filepath = file_node['id'] signature = signature_for_path[filepath] remediate_file_size(file_node, signature) remediate_checksum_nodes(file_node, signature) end ng_doc.to_xml(:indent => 2) end |
#remediate_file_size(file_node, signature) ⇒ void
This method returns an undefined value.
Returns update the file size attribute if missing, raise exception if inconsistent.
194 195 196 197 198 199 200 201 |
# File 'lib/stanford/content_inventory.rb', line 194 def remediate_file_size(file_node, signature) file_size = file_node['size'] if file_size.nil? or file_size.empty? file_node['size'] = signature.size.to_s elsif file_size != signature.size.to_s raise "Inconsistent size for #{file_node['id']}: #{file_size} != #{signature.size.to_s}" end end |
#validate_content_metadata(content_metadata) ⇒ Boolean
Returns True if contentMetadata has essetial file attributes, else raise exception.
125 126 127 128 129 |
# File 'lib/stanford/content_inventory.rb', line 125 def () result = () raise Moab::InvalidMetadataException, result[0]+" ..." if result.size > 0 true end |
#validate_content_metadata_details(content_metadata) ⇒ Array<String>
Returns List of problems found.
133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 |
# File 'lib/stanford/content_inventory.rb', line 133 def () result = [] = case .class.name when "String" Nokogiri::XML() when "Pathname" Nokogiri::XML(.read) when "Nokogiri::XML::Document" else raise Moab::InvalidMetadataException, "Content Metadata is in unrecognized format" end nodeset = .xpath("//file") nodeset.each do |file_node| missing = ['id', 'size','md5','sha1'] missing.delete('id') if file_node.has_attribute?('id') missing.delete('size') if file_node.has_attribute?('size') checksum_nodes = file_node.xpath('checksum') checksum_nodes.each do |checksum_node| case checksum_node.attributes['type'].content.upcase when 'MD5' missing.delete('md5') when 'SHA1', 'SHA-1' missing.delete('sha1') end end if missing.include?('id') result << "File node #{nodeset.index(file_node)} is missing #{missing.join(',')}" elsif missing.size > 0 id = file_node['id'] result << "File node having id='#{id}' is missing #{missing.join(',')}" end end result end |