Class: Moab::FileInventory
- Inherits:
-
Manifest
- Object
- Manifest
- Moab::FileInventory
- Includes:
- HappyMapper
- Defined in:
- lib/moab/file_inventory.rb
Overview
Copyright © 2012 by The Board of Trustees of the Leland Stanford Junior University. All rights reserved. See LICENSE for details.
A structured container for recording information about a collection of related files.
The scope of the file collection depends on inventory type:
-
version = full set of data files comprising a digital object’s version
-
additions = subset of data files that were newly added in the specified version
-
manifests = the fixity data for manifest files in the version’s root folder
-
directory = set of files that were harvested from a filesystem directory
The inventory contains one or more FileGroup subsets, which are most commonly used to provide segregation of digital object version’s content and metadata files. Each group contains one or more FileManifestation entities, each of which represents a point-in-time snapshot of a given file’s filesystem characteristics. The fixity data for a file is stored in a FileSignature element, while the filename and modification data are stored in one or more FileInstance elements. (Copies of a given file may be present in multiple locations in a collection)
Data Model
-
FileInventory = container for recording information about a collection of related files
-
FileGroup [1..*] = subset allow segregation of content and metadata files
-
FileManifestation [1..*] = snapshot of a file’s filesystem characteristics
-
FileSignature [1] = file fixity information
-
FileInstance [1..*] = filepath and timestamp of any physical file having that signature
-
-
-
Instance Attribute Summary collapse
-
#block_count ⇒ Integer
The total disk usage (in 1 kB blocks) of all data files (estimating du -k result) (dynamically calculated).
-
#digital_object_id ⇒ String
The digital object identifier (druid).
-
#file_count ⇒ Integer
The total number of data files in the inventory (dynamically calculated).
-
#groups ⇒ Array<FileGroup>
The set of data groups comprising the version.
-
#inventory_datetime ⇒ Time
The datetime at which the inventory was created.
-
#type ⇒ String
The type of inventory (version|additions|manifests|directory).
-
#version_id ⇒ Integer
The ordinal version number.
Class Method Summary collapse
-
.xml_filename(type = nil) ⇒ String
The standard name for the serialized inventory file of the given type.
Instance Method Summary collapse
-
#byte_count ⇒ Integer
The total size (in bytes) in all files of all files in the inventory (dynamically calculated).
-
#composite_key ⇒ String
The unique identifier concatenating digital object id with version id.
-
#copy_ids(other) ⇒ void
Copy objectId and versionId values from another class instance into this instance.
-
#data_source ⇒ String
Returns either the version ID (if inventory is a version manifest) or the name of the directory that was harvested to create the inventory.
-
#file_signature(group_id, file_id) ⇒ FileSignature
The signature of the specified file.
-
#group(group_id) ⇒ FileGroup
The file group in this inventory for the specified group_id.
-
#group_empty?(group_id) ⇒ Boolean
True if the group is missing or empty.
-
#group_ids(non_empty = nil) ⇒ Array<String>
Group identifiers contained in this file inventory.
-
#human_size ⇒ String
The total size of the inventory expressed in KB, MB, GB or TB, depending on the magnitutde of the value.
-
#initialize(opts = {}) ⇒ FileInventory
constructor
A new instance of FileInventory.
-
#inventory_from_bagit_bag(bag_dir) ⇒ FileInventory
Traverse a BagIt bag’s payload and return an inventory of the files it contains (using fixity from bag manifest files).
-
#inventory_from_directory(data_dir, group_id = nil) ⇒ FileInventory
Traverse a directory and return an inventory of the files it contains.
-
#non_empty_groups ⇒ Array<FileGroup] The set of data groups that contain files
Array<FileGroup] The set of data groups that contain files.
-
#package_id ⇒ String
Concatenation of the objectId and versionId values.
-
#signatures_from_bagit_manifests(bag_pathname) ⇒ Hash<Pathname,FileSignature>
The fixity data present in the bag’s manifest files.
-
#summary_fields ⇒ Array<String>
The data fields to include in summary reports.
-
#write_xml_file(parent_dir, type = nil) ⇒ void
Write the FileInventory instance to a file.
Constructor Details
#initialize(opts = {}) ⇒ FileInventory
Returns a new instance of FileInventory.
39 40 41 42 43 |
# File 'lib/moab/file_inventory.rb', line 39 def initialize(opts={}) @groups = Array.new @inventory_datetime = Time.now super(opts) end |
Instance Attribute Details
#block_count ⇒ Integer
Returns The total disk usage (in 1 kB blocks) of all data files (estimating du -k result) (dynamically calculated).
92 |
# File 'lib/moab/file_inventory.rb', line 92 attribute :block_count, Integer, :tag => 'blockCount', :on_save => Proc.new {|t| t.to_s} |
#digital_object_id ⇒ String
Returns The digital object identifier (druid).
51 |
# File 'lib/moab/file_inventory.rb', line 51 attribute :digital_object_id, String, :tag => 'objectId' |
#file_count ⇒ Integer
Returns The total number of data files in the inventory (dynamically calculated).
76 |
# File 'lib/moab/file_inventory.rb', line 76 attribute :file_count, Integer, :tag => 'fileCount', :on_save => Proc.new {|t| t.to_s} |
#groups ⇒ Array<FileGroup>
Returns The set of data groups comprising the version.
100 |
# File 'lib/moab/file_inventory.rb', line 100 has_many :groups, FileGroup, :tag => 'fileGroup' |
#inventory_datetime ⇒ Time
Returns The datetime at which the inventory was created.
64 |
# File 'lib/moab/file_inventory.rb', line 64 attribute :inventory_datetime, Time, :tag => 'inventoryDatetime', :on_save => Proc.new {|t| t.to_s} |
#type ⇒ String
Returns The type of inventory (version|additions|manifests|directory).
47 |
# File 'lib/moab/file_inventory.rb', line 47 attribute :type, String |
#version_id ⇒ Integer
Returns The ordinal version number.
55 |
# File 'lib/moab/file_inventory.rb', line 55 attribute :version_id, Integer, :tag => 'versionId', :key => true, :on_save => Proc.new {|n| n.to_s} |
Class Method Details
.xml_filename(type = nil) ⇒ String
Returns The standard name for the serialized inventory file of the given type.
252 253 254 255 256 257 258 259 260 261 262 263 264 265 |
# File 'lib/moab/file_inventory.rb', line 252 def self.xml_filename(type=nil) case type when "version" 'versionInventory.xml' when "additions" 'versionAdditions.xml' when "manifests" 'manifestInventory.xml' when "directory" 'directoryInventory.xml' else raise "unknown inventory type: #{type.to_s}" end end |
Instance Method Details
#byte_count ⇒ Integer
Returns The total size (in bytes) in all files of all files in the inventory (dynamically calculated).
84 |
# File 'lib/moab/file_inventory.rb', line 84 attribute :byte_count, Integer, :tag => 'byteCount', :on_save => Proc.new {|t| t.to_s} |
#composite_key ⇒ String
Returns The unique identifier concatenating digital object id with version id.
58 59 60 |
# File 'lib/moab/file_inventory.rb', line 58 def composite_key @digital_object_id + '-' + StorageObject.version_dirname(@version_id) end |
#copy_ids(other) ⇒ void
This method returns an undefined value.
Returns Copy objectId and versionId values from another class instance into this instance.
146 147 148 149 150 |
# File 'lib/moab/file_inventory.rb', line 146 def copy_ids(other) @digital_object_id = other.digital_object_id @version_id = other.version_id @inventory_datetime = other.inventory_datetime end |
#data_source ⇒ String
Returns either the version ID (if inventory is a version manifest) or the name of the directory that was harvested to create the inventory
160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 |
# File 'lib/moab/file_inventory.rb', line 160 def data_source data_source = (groups.collect { |g| g.data_source.to_s }).join('|') if data_source.start_with?('contentMetadata') if version_id "v#{version_id.to_s}-#{data_source}" else "new-#{data_source}" end else if version_id "v#{version_id.to_s}" else data_source end end end |
#file_signature(group_id, file_id) ⇒ FileSignature
Returns The signature of the specified file.
135 136 137 138 139 140 141 |
# File 'lib/moab/file_inventory.rb', line 135 def file_signature(group_id, file_id) file_group = group(group_id) raise FileNotFoundException, "group #{group_id} not found for #{@digital_object_id} - #{@version_id}" if file_group.nil? file_signature = file_group.path_hash[file_id] raise FileNotFoundException, "#{group_id} file #{file_id} not found for #{@digital_object_id} - #{@version_id}" if file_signature.nil? file_signature end |
#group(group_id) ⇒ FileGroup
Returns The file group in this inventory for the specified group_id.
116 117 118 |
# File 'lib/moab/file_inventory.rb', line 116 def group(group_id) @groups.find{ |group| group.group_id == group_id} end |
#group_empty?(group_id) ⇒ Boolean
Returns true if the group is missing or empty.
122 123 124 125 |
# File 'lib/moab/file_inventory.rb', line 122 def group_empty?(group_id) group = self.group(group_id) group.nil? or group.files.empty? end |
#group_ids(non_empty = nil) ⇒ Array<String>
Returns group identifiers contained in this file inventory.
109 110 111 112 |
# File 'lib/moab/file_inventory.rb', line 109 def group_ids(non_empty=nil) groups = non_empty ? self.non_empty_groups : @groups groups.map{|group| group.group_id} end |
#human_size ⇒ String
Returns The total size of the inventory expressed in KB, MB, GB or TB, depending on the magnitutde of the value.
235 236 237 238 239 240 241 242 243 244 245 246 247 |
# File 'lib/moab/file_inventory.rb', line 235 def human_size count = 0 size = byte_count while size >= 1024 and count < 4 size /= 1024.0 count += 1 end if count == 0 sprintf("%d B", size) else sprintf("%.2f %s", size, %w[B KB MB GB TB][count]) end end |
#inventory_from_bagit_bag(bag_dir) ⇒ FileInventory
Returns Traverse a BagIt bag’s payload and return an inventory of the files it contains (using fixity from bag manifest files).
197 198 199 200 201 202 203 204 205 |
# File 'lib/moab/file_inventory.rb', line 197 def inventory_from_bagit_bag(bag_dir) bag_pathname = Pathname(bag_dir) signatures_from_bag = signatures_from_bagit_manifests(bag_pathname) bag_data_subdirs = bag_pathname.join('data').children bag_data_subdirs.each do |subdir| @groups << FileGroup.new(:group_id=>subdir.basename.to_s).group_from_bagit_subdir(subdir, signatures_from_bag) end self end |
#inventory_from_directory(data_dir, group_id = nil) ⇒ FileInventory
Returns Traverse a directory and return an inventory of the files it contains.
184 185 186 187 188 189 190 191 192 193 |
# File 'lib/moab/file_inventory.rb', line 184 def inventory_from_directory(data_dir,group_id=nil) if group_id @groups << FileGroup.new(:group_id=>group_id).group_from_directory(data_dir) else ['content','metadata'].each do |group_id| @groups << FileGroup.new(:group_id=>group_id).group_from_directory(Pathname(data_dir).join(group_id)) end end self end |
#non_empty_groups ⇒ Array<FileGroup] The set of data groups that contain files
Returns Array<FileGroup] The set of data groups that contain files.
103 104 105 |
# File 'lib/moab/file_inventory.rb', line 103 def non_empty_groups @groups.select{|group| !group.files.empty?} end |
#package_id ⇒ String
Returns Concatenation of the objectId and versionId values.
154 155 156 |
# File 'lib/moab/file_inventory.rb', line 154 def package_id "#{@digital_object_id}-v#{@version_id}" end |
#signatures_from_bagit_manifests(bag_pathname) ⇒ Hash<Pathname,FileSignature>
Returns The fixity data present in the bag’s manifest files.
209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 |
# File 'lib/moab/file_inventory.rb', line 209 def signatures_from_bagit_manifests(bag_pathname) manifest_pathname = Hash.new checksum_types = [:md5, :sha1, :sha256] checksum_types.each do |type| manifest_pathname[type] = bag_pathname.join("manifest-#{type.to_s}.txt") end signatures = OrderedHash.new { |hash,path| hash[path] = FileSignature.new } checksum_types.each do |type| if manifest_pathname[type].exist? manifest_pathname[type].each_line do |line| line.chomp! checksum,data_path = line.split(/\s+\**/,2) if checksum && data_path file_pathname = bag_pathname.join(data_path) signature = signatures[file_pathname] signature.set_checksum(type, checksum) end end end end signatures.each {|file_pathname,signature| signature.size = file_pathname.size} signatures end |
#summary_fields ⇒ Array<String>
Returns The data fields to include in summary reports.
128 129 130 |
# File 'lib/moab/file_inventory.rb', line 128 def summary_fields %w{type digital_object_id version_id inventory_datetime file_count byte_count block_count groups} end |
#write_xml_file(parent_dir, type = nil) ⇒ void
This method returns an undefined value.
Returns write the Moab::FileInventory instance to a file.
272 273 274 275 |
# File 'lib/moab/file_inventory.rb', line 272 def write_xml_file(parent_dir, type=nil) type = @type if type.nil? self.class.write_xml_file(self, parent_dir, type) end |