Class: Moab::FileSignature
- Inherits:
-
Serializable
- Object
- Serializable
- Moab::FileSignature
- Includes:
- HappyMapper
- Defined in:
- lib/moab/file_signature.rb
Overview
Copyright © 2012 by The Board of Trustees of the Leland Stanford Junior University. All rights reserved. See LICENSE for details.
The fixity properties of a file, used to determine file content equivalence regardless of filename. Placing this data in a class by itself facilitates using file size together with the MD5 and SHA1 checksums as a single key when doing comparisons against other file instances. The Moab design assumes that this file signature is sufficiently unique to act as a comparator for determining file equality and eliminating file redundancy.
The use of signatures for a compare-by-hash mechanism introduces a miniscule (but non-zero) risk that two non-identical files will have the same checksum. While this risk is only about 1 in 1048 when using the SHA1 checksum alone, it can be reduced even further (to about 1 in 1086) if we use the MD5 and SHA1 checksums together. And we gain a bit more comfort by including a comparison of file sizes.
Finally, the “collision” risk is reduced by isolation of each digital object’s file pool within an object folder, instead of in a common storage area shared by the whole repository.
Data Model
-
FileInventory = container for recording information about a collection of related files
-
FileGroup [1..*] = subset allow segregation of content and metadata files
-
FileManifestation [1..*] = snapshot of a file’s filesystem characteristics
-
FileSignature [1] = file fixity information
-
FileInstance [1..*] = filepath and timestamp of any physical file having that signature
-
-
-
-
SignatureCatalog = lookup table containing a cumulative collection of all files ever ingested
-
SignatureCatalogEntry [1..*] = an row in the lookup table containing storage information about a single file
-
FileSignature [1] = file fixity information
-
-
-
FileInventoryDifference = compares two FileInventory instances based on file signatures and pathnames
-
FileGroupDifference [1..*] = performs analysis and reports differences between two matching FileGroup objects
-
FileGroupDifferenceSubset [1..5] = collects a set of file-level differences of a give change type
-
FileInstanceDifference [1..*] = contains difference information at the file level
-
FileSignature [1..2] = contains the file signature(s) of two file instances being compared
-
-
-
-
Instance Attribute Summary collapse
-
#md5 ⇒ String
The MD5 checksum value of the file.
-
#sha1 ⇒ String
The SHA1 checksum value of the file.
-
#sha256 ⇒ String
The SHA256 checksum value of the file.
-
#size ⇒ Integer
The size of the file in bytes.
Class Method Summary collapse
-
.checksum_names_for_type ⇒ Hash<Symbol,String>
Key is type (e.g. :sha1), value is checksum names (e.g. [‘SHA-1’, ‘SHA1’]).
-
.checksum_type_for_name ⇒ Hash<String, Symbol>
Key is checksum name (e.g. MD5), value is checksum type (e.g. :md5).
Instance Method Summary collapse
-
#==(other) ⇒ Object
(see #eql?).
-
#checksums ⇒ Hash<Symbol,String>
A hash of the checksum data.
-
#complete? ⇒ Boolean
The signature contains all of the 3 desired checksums.
-
#eql?(other) ⇒ Boolean
Returns true if self and other have comparable fixity data.
-
#fixity ⇒ Hash<Symbol,String>
A hash of fixity data from this signataure object.
-
#hash ⇒ Fixnum
Compute a hash-code for the fixity value array.
-
#initialize(opts = {}) ⇒ FileSignature
constructor
A new instance of FileSignature.
-
#normalized_signature(pathname) ⇒ FileSignature
The full signature derived from the file, unless the fixity is inconsistent with current values.
-
#set_checksum(type, value) ⇒ void
Set the value of the specified checksum type.
-
#signature_from_file(pathname) ⇒ FileSignature
Generate a FileSignature instance containing size and checksums for a physical file.
Constructor Details
#initialize(opts = {}) ⇒ FileSignature
Returns a new instance of FileSignature.
50 51 52 |
# File 'lib/moab/file_signature.rb', line 50 def initialize(opts={}) super(opts) end |
Instance Attribute Details
#md5 ⇒ String
Returns The MD5 checksum value of the file.
60 |
# File 'lib/moab/file_signature.rb', line 60 attribute :md5, String, :on_save => Proc.new { |n| n.nil? ? "" : n.to_s } |
#sha1 ⇒ String
Returns The SHA1 checksum value of the file.
64 |
# File 'lib/moab/file_signature.rb', line 64 attribute :sha1, String, :on_save => Proc.new { |n| n.nil? ? "" : n.to_s } |
#sha256 ⇒ String
Returns The SHA256 checksum value of the file.
68 |
# File 'lib/moab/file_signature.rb', line 68 attribute :sha256, String, :on_save => Proc.new { |n| n.nil? ? "" : n.to_s } |
#size ⇒ Integer
Returns The size of the file in bytes.
56 |
# File 'lib/moab/file_signature.rb', line 56 attribute :size, Integer, :on_save => Proc.new { |n| n.to_s } |
Class Method Details
.checksum_names_for_type ⇒ Hash<Symbol,String>
Returns Key is type (e.g. :sha1), value is checksum names (e.g. [‘SHA-1’, ‘SHA1’]).
179 180 181 182 183 184 185 |
# File 'lib/moab/file_signature.rb', line 179 def FileSignature.checksum_names_for_type names_for_type = OrderedHash.new names_for_type[:md5] = ['MD5'] names_for_type[:sha1] = ['SHA-1', 'SHA1'] names_for_type[:sha256] = ['SHA-256', 'SHA256'] names_for_type end |
.checksum_type_for_name ⇒ Hash<String, Symbol>
Returns Key is checksum name (e.g. MD5), value is checksum type (e.g. :md5).
188 189 190 191 192 193 194 195 196 |
# File 'lib/moab/file_signature.rb', line 188 def FileSignature.checksum_type_for_name type_for_name = OrderedHash.new self.checksum_names_for_type.each do |type, names| names.each do |name| type_for_name[name] = type end end type_for_name end |
Instance Method Details
#==(other) ⇒ Object
(see #eql?)
127 128 129 |
# File 'lib/moab/file_signature.rb', line 127 def ==(other) eql?(other) end |
#checksums ⇒ Hash<Symbol,String>
Returns A hash of the checksum data.
87 88 89 90 91 92 93 94 |
# File 'lib/moab/file_signature.rb', line 87 def checksums checksum_hash = OrderedHash.new checksum_hash[:md5] = @md5 checksum_hash[:sha1] = @sha1 checksum_hash[:sha256] = @sha256 checksum_hash.delete_if { |key,value| value.nil? or value.empty?} checksum_hash end |
#complete? ⇒ Boolean
Returns The signature contains all of the 3 desired checksums.
97 98 99 |
# File 'lib/moab/file_signature.rb', line 97 def complete? checksums.size == 3 end |
#eql?(other) ⇒ Boolean
Returns true if self and other have comparable fixity data.
113 114 115 116 117 118 119 120 121 122 123 |
# File 'lib/moab/file_signature.rb', line 113 def eql?(other) return false if self.size.to_i != other.size.to_i self_checksums = self.checksums other_checksums = other.checksums matching_keys = self_checksums.keys & other_checksums.keys return false if matching_keys.size == 0 matching_keys.each do |key| return false if self_checksums[key] != other_checksums[key] end true end |
#fixity ⇒ Hash<Symbol,String>
Returns A hash of fixity data from this signataure object.
103 104 105 106 107 108 |
# File 'lib/moab/file_signature.rb', line 103 def fixity fixity_hash = OrderedHash.new fixity_hash[:size] = @size.to_s fixity_hash.merge!(checksums) fixity_hash end |
#hash ⇒ Fixnum
Returns Compute a hash-code for the fixity value array. Two file instances with the same content will have the same hash code (and will compare using eql?).
139 140 141 |
# File 'lib/moab/file_signature.rb', line 139 def hash @size.to_i end |
#normalized_signature(pathname) ⇒ FileSignature
Returns The full signature derived from the file, unless the fixity is inconsistent with current values.
167 168 169 170 171 172 173 174 175 176 |
# File 'lib/moab/file_signature.rb', line 167 def normalized_signature(pathname) sig_from_file = FileSignature.new.signature_from_file(pathname) if self.eql?(sig_from_file) # The full signature from file is consistent with current values return sig_from_file else # One or more of the fixity values is inconsistent, so raise an exception raise "Signature inconsistent between inventory and file for #{pathname}: #{self.diff(sig_from_file).inspect}" end end |
#set_checksum(type, value) ⇒ void
This method returns an undefined value.
Returns Set the value of the specified checksum type.
73 74 75 76 77 78 79 80 81 82 83 84 |
# File 'lib/moab/file_signature.rb', line 73 def set_checksum(type,value) case type.to_s.downcase.to_sym when :md5 @md5 = value when :sha1 @sha1 = value when :sha256 @sha256 = value else raise "Unknown checksum type '#{type.to_s}'" end end |
#signature_from_file(pathname) ⇒ FileSignature
Returns Generate a FileSignature instance containing size and checksums for a physical file.
146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 |
# File 'lib/moab/file_signature.rb', line 146 def signature_from_file(pathname) @size = pathname.size md5_digest = Digest::MD5.new sha1_digest = Digest::SHA1.new sha256_digest = Digest::SHA2.new(256) pathname.open("r") do |stream| while buffer = stream.read(8192) md5_digest.update(buffer) sha1_digest.update(buffer) sha256_digest.update(buffer) end end @md5 = md5_digest.hexdigest @sha1 = sha1_digest.hexdigest @sha256 = sha256_digest.hexdigest self end |