Class: Moab::FileInventory
- Inherits:
-
Serializer::Manifest
- Object
- Serializer::Serializable
- Serializer::Manifest
- Moab::FileInventory
- Includes:
- HappyMapper
- Defined in:
- lib/moab/file_inventory.rb
Overview
Copyright © 2012 by The Board of Trustees of the Leland Stanford Junior University. All rights reserved. See LICENSE for details.
A structured container for recording information about a collection of related files.
The scope of the file collection depends on inventory type:
-
version = full set of data files comprising a digital object’s version
-
additions = subset of data files that were newly added in the specified version
-
manifests = the fixity data for manifest files in the version’s root folder
-
directory = set of files that were harvested from a filesystem directory
The inventory contains one or more FileGroup subsets, which are most commonly used to provide segregation of digital object version’s content and metadata files. Each group contains one or more FileManifestation entities, each of which represents a point-in-time snapshot of a given file’s filesystem characteristics. The fixity data for a file is stored in a FileSignature element, while the filename and modification data are stored in one or more FileInstance elements. (Copies of a given file may be present in multiple locations in a collection)
Data Model
-
FileInventory = container for recording information about a collection of related files
-
FileGroup [1..*] = subset allow segregation of content and metadata files
-
FileManifestation [1..*] = snapshot of a file’s filesystem characteristics
-
FileSignature [1] = file fixity information
-
FileInstance [1..*] = filepath and timestamp of any physical file having that signature
-
-
-
Instance Attribute Summary collapse
-
#block_count ⇒ Integer
The total disk usage (in 1 kB blocks) of all data files (estimating du -k result) (dynamically calculated).
-
#digital_object_id ⇒ String
The digital object identifier (druid).
-
#file_count ⇒ Integer
The total number of data files in the inventory (dynamically calculated).
-
#groups ⇒ Array<FileGroup>
The set of data groups comprising the version.
-
#inventory_datetime ⇒ String
The datetime at which the inventory was created.
-
#type ⇒ String
The type of inventory (version|additions|manifests|directory).
-
#version_id ⇒ Integer
The ordinal version number.
Class Method Summary collapse
-
.xml_filename(type = nil) ⇒ String
The standard name for the serialized inventory file of the given type.
Instance Method Summary collapse
-
#byte_count ⇒ Integer
The total size (in bytes) in all files of all files in the inventory (dynamically calculated).
-
#composite_key ⇒ String
The unique identifier concatenating digital object id with version id.
-
#copy_ids(other) ⇒ void
Copy objectId and versionId values from another class instance into this instance.
-
#data_source ⇒ String
Returns either the version ID (if inventory is a version manifest) or the name of the directory that was harvested to create the inventory.
-
#file_signature(group_id, file_id) ⇒ FileSignature
The signature of the specified file.
-
#group(group_id) ⇒ FileGroup
The file group in this inventory for the specified group_id.
-
#group_empty?(group_id) ⇒ Boolean
True if the group is missing or empty.
-
#group_ids(non_empty = nil) ⇒ Array<String>
Group identifiers contained in this file inventory.
-
#human_size ⇒ String
The total size of the inventory expressed in KB, MB, GB or TB, depending on the magnitutde of the value.
-
#initialize(opts = {}) ⇒ FileInventory
constructor
A new instance of FileInventory.
-
#inventory_from_bagit_bag(bag_dir) ⇒ FileInventory
Traverse a BagIt bag’s payload and return an inventory of the files it contains (using fixity from bag manifest files).
-
#inventory_from_directory(data_dir, group_id = nil) ⇒ FileInventory
Traverse a directory and return an inventory of the files it contains.
-
#non_empty_groups ⇒ Array<FileGroup] The set of data groups that contain files
Array<FileGroup] The set of data groups that contain files.
-
#package_id ⇒ String
Concatenation of the objectId and versionId values.
-
#signatures_from_bagit_manifests(bag_pathname) ⇒ Hash<Pathname,FileSignature>
The fixity data present in the bag’s manifest files.
-
#summary_fields ⇒ Array<String>
The data fields to include in summary reports.
-
#write_xml_file(parent_dir, type = nil) ⇒ void
Write the FileInventory instance to a file.
Methods inherited from Serializer::Manifest
read_xml_file, write_xml_file, xml_pathname, xml_pathname_exist?
Methods inherited from Serializer::Serializable
#array_to_hash, deep_diff, #diff, #key, #key_name, #summary, #to_hash, #to_json, #to_yaml, #variable_names, #variables
Constructor Details
#initialize(opts = {}) ⇒ FileInventory
Returns a new instance of FileInventory.
37 38 39 40 41 |
# File 'lib/moab/file_inventory.rb', line 37 def initialize(opts = {}) @groups = [] @inventory_datetime = Time.now super(opts) end |
Instance Attribute Details
#block_count ⇒ Integer
Returns The total disk usage (in 1 kB blocks) of all data files (estimating du -k result) (dynamically calculated).
90 |
# File 'lib/moab/file_inventory.rb', line 90 attribute :block_count, Integer, tag: 'blockCount', on_save: proc { |t| t.to_s } |
#digital_object_id ⇒ String
Returns The digital object identifier (druid).
49 |
# File 'lib/moab/file_inventory.rb', line 49 attribute :digital_object_id, String, tag: 'objectId' |
#file_count ⇒ Integer
Returns The total number of data files in the inventory (dynamically calculated).
74 |
# File 'lib/moab/file_inventory.rb', line 74 attribute :file_count, Integer, tag: 'fileCount', on_save: proc { |t| t.to_s } |
#groups ⇒ Array<FileGroup>
Returns The set of data groups comprising the version.
98 |
# File 'lib/moab/file_inventory.rb', line 98 has_many :groups, FileGroup, tag: 'fileGroup' |
#inventory_datetime ⇒ String
Returns The datetime at which the inventory was created.
62 |
# File 'lib/moab/file_inventory.rb', line 62 attribute :inventory_datetime, String, tag: 'inventoryDatetime' |
#type ⇒ String
Returns The type of inventory (version|additions|manifests|directory).
45 |
# File 'lib/moab/file_inventory.rb', line 45 attribute :type, String |
#version_id ⇒ Integer
Returns The ordinal version number.
53 |
# File 'lib/moab/file_inventory.rb', line 53 attribute :version_id, Integer, tag: 'versionId', key: true, on_save: proc { |n| n.to_s } |
Class Method Details
.xml_filename(type = nil) ⇒ String
Returns The standard name for the serialized inventory file of the given type.
248 249 250 251 252 253 254 255 256 257 258 259 260 261 |
# File 'lib/moab/file_inventory.rb', line 248 def self.xml_filename(type = nil) case type when 'version' 'versionInventory.xml' when 'additions' 'versionAdditions.xml' when 'manifests' 'manifestInventory.xml' when 'directory' 'directoryInventory.xml' else raise ArgumentError, "unknown inventory type: #{type}" end end |
Instance Method Details
#byte_count ⇒ Integer
Returns The total size (in bytes) in all files of all files in the inventory (dynamically calculated).
82 |
# File 'lib/moab/file_inventory.rb', line 82 attribute :byte_count, Integer, tag: 'byteCount', on_save: proc { |t| t.to_s } |
#composite_key ⇒ String
Returns The unique identifier concatenating digital object id with version id.
56 57 58 |
# File 'lib/moab/file_inventory.rb', line 56 def composite_key "#{digital_object_id}-#{StorageObject.version_dirname(version_id)}" end |
#copy_ids(other) ⇒ void
This method returns an undefined value.
Returns Copy objectId and versionId values from another class instance into this instance.
148 149 150 151 152 |
# File 'lib/moab/file_inventory.rb', line 148 def copy_ids(other) @digital_object_id = other.digital_object_id @version_id = other.version_id @inventory_datetime = other.inventory_datetime end |
#data_source ⇒ String
Returns either the version ID (if inventory is a version manifest) or the name of the directory that was harvested to create the inventory
163 164 165 166 167 168 169 170 |
# File 'lib/moab/file_inventory.rb', line 163 def data_source data_source = (groups.collect { |g| g.data_source.to_s }).join('|') if data_source.start_with?('contentMetadata') version_id ? "v#{version_id}-#{data_source}" : "new-#{data_source}" else version_id ? "v#{version_id}" : data_source end end |
#file_signature(group_id, file_id) ⇒ FileSignature
Returns The signature of the specified file.
133 134 135 136 137 138 139 140 141 142 143 |
# File 'lib/moab/file_inventory.rb', line 133 def file_signature(group_id, file_id) file_group = group(group_id) errmsg = "group #{group_id} not found for #{digital_object_id} - #{version_id}" raise FileNotFoundException, errmsg if file_group.nil? file_signature = file_group.path_hash[file_id] errmsg = "#{group_id} file #{file_id} not found for #{digital_object_id} - #{version_id}" raise FileNotFoundException, errmsg if file_signature.nil? file_signature end |
#group(group_id) ⇒ FileGroup
Returns The file group in this inventory for the specified group_id.
114 115 116 |
# File 'lib/moab/file_inventory.rb', line 114 def group(group_id) groups.find { |group| group.group_id == group_id } end |
#group_empty?(group_id) ⇒ Boolean
Returns true if the group is missing or empty.
120 121 122 123 |
# File 'lib/moab/file_inventory.rb', line 120 def group_empty?(group_id) group = self.group(group_id) group.nil? || group.files.empty? end |
#group_ids(non_empty = nil) ⇒ Array<String>
Returns group identifiers contained in this file inventory.
107 108 109 110 |
# File 'lib/moab/file_inventory.rb', line 107 def group_ids(non_empty = nil) my_groups = non_empty ? non_empty_groups : groups my_groups.map(&:group_id) end |
#human_size ⇒ String
Returns The total size of the inventory expressed in KB, MB, GB or TB, depending on the magnitutde of the value.
229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 |
# File 'lib/moab/file_inventory.rb', line 229 def human_size count = 0 size = byte_count while (size >= 1024) && (count < 4) size /= 1024.0 count += 1 end if count == 0 format('%d B', size) else # rubocop:disable Style/FormatStringToken format('%.2f %s', size, %w[B KB MB GB TB][count]) # rubocop:enable Style/FormatStringToken end end |
#inventory_from_bagit_bag(bag_dir) ⇒ FileInventory
Returns Traverse a BagIt bag’s payload and return an inventory of the files it contains (using fixity from bag manifest files).
192 193 194 195 196 197 198 199 200 |
# File 'lib/moab/file_inventory.rb', line 192 def inventory_from_bagit_bag(bag_dir) bag_pathname = Pathname(bag_dir) signatures_from_bag = signatures_from_bagit_manifests(bag_pathname) bag_data_subdirs = bag_pathname.join('data').children bag_data_subdirs.each do |subdir| groups << FileGroup.new(group_id: subdir.basename.to_s).group_from_bagit_subdir(subdir, signatures_from_bag) end self end |
#inventory_from_directory(data_dir, group_id = nil) ⇒ FileInventory
Returns Traverse a directory and return an inventory of the files it contains.
178 179 180 181 182 183 184 185 186 187 |
# File 'lib/moab/file_inventory.rb', line 178 def inventory_from_directory(data_dir, group_id = nil) if group_id groups << FileGroup.new(group_id: group_id).group_from_directory(data_dir) else %w[content metadata].each do |gid| groups << FileGroup.new(group_id: gid).group_from_directory(Pathname(data_dir).join(gid)) end end self end |
#non_empty_groups ⇒ Array<FileGroup] The set of data groups that contain files
Returns Array<FileGroup] The set of data groups that contain files.
101 102 103 |
# File 'lib/moab/file_inventory.rb', line 101 def non_empty_groups groups.reject { |group| group.files.empty? } end |
#package_id ⇒ String
Returns Concatenation of the objectId and versionId values.
156 157 158 |
# File 'lib/moab/file_inventory.rb', line 156 def package_id "#{digital_object_id}-v#{version_id}" end |
#signatures_from_bagit_manifests(bag_pathname) ⇒ Hash<Pathname,FileSignature>
Returns The fixity data present in the bag’s manifest files.
204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 |
# File 'lib/moab/file_inventory.rb', line 204 def signatures_from_bagit_manifests(bag_pathname) manifest_pathname = {} DEFAULT_CHECKSUM_TYPES.each do |type| manifest_pathname[type] = bag_pathname.join("manifest-#{type}.txt") end signatures = Hash.new { |hash, path| hash[path] = FileSignature.new } DEFAULT_CHECKSUM_TYPES.each do |type| if manifest_pathname[type].exist? manifest_pathname[type].each_line do |line| line.chomp! checksum, data_path = line.split(/\s+\**/, 2) if checksum && data_path file_pathname = bag_pathname.join(data_path) signature = signatures[file_pathname] signature.set_checksum(type, checksum) end end end end signatures.each { |file_pathname, signature| signature.size = file_pathname.size } signatures end |
#summary_fields ⇒ Array<String>
Returns The data fields to include in summary reports.
126 127 128 |
# File 'lib/moab/file_inventory.rb', line 126 def summary_fields %w[type digital_object_id version_id inventory_datetime file_count byte_count block_count groups] end |
#write_xml_file(parent_dir, type = nil) ⇒ void
This method returns an undefined value.
Returns write the Moab::FileInventory instance to a file.
268 269 270 271 |
# File 'lib/moab/file_inventory.rb', line 268 def write_xml_file(parent_dir, type = nil) type = @type if type.nil? self.class.write_xml_file(self, parent_dir, type) end |