Class: Moab::FileGroupDifference

Inherits:
Serializable
  • Object
show all
Includes:
HappyMapper
Defined in:
lib/moab/file_group_difference.rb

Overview

Note:

Copyright © 2012 by The Board of Trustees of the Leland Stanford Junior University. All rights reserved. See LICENSE for details.

Performs analysis and reports the differences between two matching FileGroup objects. The descending elements of the report hold a detailed breakdown of file-level differences, organized by change type. This stanza is a child element of FileInventoryDifference, the documentation of which contains a full example.

In order to determine the detailed nature of the differences that are present between the two manifests, this algorithm first compares the sets of file signatures present in the groups being compared, then uses the result of that operation for subsequent analysis of filename correspondences.

For the first step, a Ruby Hash is extracted from each of the of the two groups, with an array of FileSignature object used as hash keys, and the corresponding FileInstance arrays as the hash values. The set of keys from the basis hash can be compared against the keys from the other hash using Array operators:

  • matching = basis_array & other_array

  • basis_only = basis_array - other_array

  • other_only = other_array - basis_array

For the second step of the comparison, the matching and non-matching sets of hash entries are further categorized as follows:

  • identical = signature and file path is the same in both basis and other file group

  • renamed = signature is unchanged, but the path has moved

  • copyadded = duplicate copy of file was added

  • copydeleted = duplicate copy of file was deleted

  • modified = path is same in both groups, but the signature has changed

  • added = signature and path are only in the other inventor

  • deleted = signature and path are only in the basis inventory

Data Model

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(opts = {}) ⇒ FileGroupDifference

Returns a new instance of FileGroupDifference.



56
57
58
59
# File 'lib/moab/file_group_difference.rb', line 56

def initialize(opts={})
  @subset_hash = OrderedHash.new {|hash, key| hash[key] = FileGroupDifferenceSubset.new(:change => key.to_s)}
  super(opts)
end

Instance Attribute Details

#addedInteger

Returns How many files were added.

Returns:

  • (Integer)

    How many files were added



114
# File 'lib/moab/file_group_difference.rb', line 114

attribute :added, Integer, :on_save => Proc.new { |n| n.to_s }

#copyaddedInteger

Returns How many duplicate copies of files were added.

Returns:

  • (Integer)

    How many duplicate copies of files were added



86
# File 'lib/moab/file_group_difference.rb', line 86

attribute :copyadded, Integer, :on_save => Proc.new { |n| n.to_s }

#copydeletedInteger

Returns How many duplicate copies of files were deleted.

Returns:

  • (Integer)

    How many duplicate copies of files were deleted



93
# File 'lib/moab/file_group_difference.rb', line 93

attribute :copydeleted, Integer, :on_save => Proc.new { |n| n.to_s }

#deletedInteger

Returns How many files were deleted.

Returns:

  • (Integer)

    How many files were deleted



121
# File 'lib/moab/file_group_difference.rb', line 121

attribute :deleted, Integer, :on_save => Proc.new { |n| n.to_s }

#difference_countInteger

Returns the total number of differences found between the two inventories that were compared (dynamically calculated).

Returns:

  • (Integer)

    the total number of differences found between the two inventories that were compared (dynamically calculated)



67
# File 'lib/moab/file_group_difference.rb', line 67

attribute :difference_count, Integer, :tag => 'differenceCount', :on_save => Proc.new { |i| i.to_s }

#group_idString

Returns The name of the file group.

Returns:

  • (String)

    The name of the file group



63
# File 'lib/moab/file_group_difference.rb', line 63

attribute :group_id, String, :tag => 'groupId', :key => true

#identicalInteger

Returns How many files were unchanged.

Returns:

  • (Integer)

    How many files were unchanged



79
# File 'lib/moab/file_group_difference.rb', line 79

attribute :identical, Integer, :on_save => Proc.new { |n| n.to_s }

#modifiedInteger

Returns How many files were modified.

Returns:

  • (Integer)

    How many files were modified



107
# File 'lib/moab/file_group_difference.rb', line 107

attribute :modified, Integer, :on_save => Proc.new { |n| n.to_s }

#renamedInteger

Returns How many files were renamed.

Returns:

  • (Integer)

    How many files were renamed



100
# File 'lib/moab/file_group_difference.rb', line 100

attribute :renamed, Integer, :on_save => Proc.new { |n| n.to_s }

#subset_hashHash<Symbol,FileGroupDifferenceSubset>

Returns A set of containers (one for each change type), each of which contains a collection of file-level differences having that change type.

Returns:

  • (Hash<Symbol,FileGroupDifferenceSubset>)

    A set of containers (one for each change type), each of which contains a collection of file-level differences having that change type.



47
48
49
# File 'lib/moab/file_group_difference.rb', line 47

def subset_hash
  @subset_hash
end

#subsetsArray<FileGroupDifferenceSubset>

Returns A set of Arrays (one for each change type), each of which contains an collection of file-level differences having that change type.

Returns:

  • (Array<FileGroupDifferenceSubset>)

    A set of Arrays (one for each change type), each of which contains an collection of file-level differences having that change type.



129
# File 'lib/moab/file_group_difference.rb', line 129

has_many :subsets, FileGroupDifferenceSubset, :tag => 'subset'

Instance Method Details

#basis_only_keys(basis_hash, other_hash) ⇒ Array

Returns Compare the keys of two hashes and return the keys unique to the first hash.

Parameters:

  • basis_hash (Hash)

    The first hash being compared

  • other_hash (Hash)

    The second hash being compared

Returns:

  • (Array)

    Compare the keys of two hashes and return the keys unique to the first hash



173
174
175
# File 'lib/moab/file_group_difference.rb', line 173

def basis_only_keys(basis_hash, other_hash)
  basis_hash.keys - other_hash.keys
end

#compare_file_groups(basis_group, other_group) ⇒ FileGroupDifference

Returns Compare two file groups and return a differences report.

Parameters:

  • basis_group (FileGroup)

    The file group that is the basis of the comparison

  • other_group (FileGroup)

    The file group that is compared against the basis group

Returns:



188
189
190
191
192
193
# File 'lib/moab/file_group_difference.rb', line 188

def compare_file_groups(basis_group, other_group)
  @group_id = basis_group.group_id
  compare_matching_signatures(basis_group, other_group)
  compare_non_matching_signatures(basis_group, other_group)
  self
end

#compare_matching_signatures(basis_group, other_group) ⇒ FileGroupDifference

Returns For signatures that are present in both groups, report which file instances are identical or renamed.

Parameters:

  • basis_group (FileGroup)

    The file group that is the basis of the comparison

  • other_group (FileGroup)

    The file group that is compared against the basis group

Returns:

  • (FileGroupDifference)

    For signatures that are present in both groups, report which file instances are identical or renamed



199
200
201
202
203
204
# File 'lib/moab/file_group_difference.rb', line 199

def compare_matching_signatures(basis_group, other_group)
  matching_signatures = matching_keys(basis_group.signature_hash, other_group.signature_hash)
  tabulate_unchanged_files(matching_signatures, basis_group.signature_hash, other_group.signature_hash)
  tabulate_renamed_files(matching_signatures, basis_group.signature_hash, other_group.signature_hash)
  self
end

#compare_non_matching_signatures(basis_group, other_group) ⇒ FileGroupDifference

Returns For signatures that are present in only one or the other group, report which file instances are modified, deleted, or added.

Parameters:

  • basis_group (FileGroup)

    The file group that is the basis of the comparison

  • other_group (FileGroup)

    The file group that is compared against the basis group

Returns:

  • (FileGroupDifference)

    For signatures that are present in only one or the other group, report which file instances are modified, deleted, or added



210
211
212
213
214
215
216
217
218
219
# File 'lib/moab/file_group_difference.rb', line 210

def compare_non_matching_signatures(basis_group, other_group)
  basis_only_signatures = basis_only_keys(basis_group.signature_hash, other_group.signature_hash)
  other_only_signatures = other_only_keys(basis_group.signature_hash, other_group.signature_hash)
  basis_path_hash = basis_group.path_hash_subset(basis_only_signatures)
  other_path_hash = other_group.path_hash_subset(other_only_signatures)
  tabulate_modified_files(basis_path_hash, other_path_hash)
  tabulate_added_files(basis_path_hash, other_path_hash)
  tabulate_deleted_files(basis_path_hash, other_path_hash)
  self
end

#file_deltasHash<Symbol,Array>

Returns Sets of filenames grouped by change type for use in performing file or metadata operations.

Returns:

  • (Hash<Symbol,Array>)

    Sets of filenames grouped by change type for use in performing file or metadata operations



338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
# File 'lib/moab/file_group_difference.rb', line 338

def file_deltas()
  # The hash to be returned
  deltas = Hash.new {|hash, key| hash[key] = []}
  # case where other_path is empty or 'same'.  (create array of strings)
  [:identical, :modified, :deleted, :copydeleted].each do |change|
    deltas[change].concat @subset_hash[change].files.collect{|file| file.basis_path}
  end
  # case where basis_path and other_path are both present.  (create array of arrays)
  [:copyadded, :renamed].each do |change|
    deltas[change].concat @subset_hash[change].files.collect{|file| [file.basis_path,file.other_path]}
  end
  # case where basis_path is empty.  (create array of strings)
  [:added].each do |change|
    deltas[change].concat @subset_hash[change].files.collect{|file| file.other_path}
  end
  deltas
end

#matching_keys(basis_hash, other_hash) ⇒ Array

Returns Compare the keys of two hashes and return the intersection.

Parameters:

  • basis_hash (Hash)

    The first hash being compared

  • other_hash (Hash)

    The second hash being compared

Returns:

  • (Array)

    Compare the keys of two hashes and return the intersection



166
167
168
# File 'lib/moab/file_group_difference.rb', line 166

def matching_keys(basis_hash, other_hash)
  basis_hash.keys & other_hash.keys
end

#other_only_keys(basis_hash, other_hash) ⇒ Array

Returns Compare the keys of two hashes and return the keys unique to the second hash.

Parameters:

  • basis_hash (Hash)

    The first hash being compared

  • other_hash (Hash)

    The second hash being compared

Returns:

  • (Array)

    Compare the keys of two hashes and return the keys unique to the second hash



180
181
182
# File 'lib/moab/file_group_difference.rb', line 180

def other_only_keys(basis_hash, other_hash)
  other_hash.keys - basis_hash.keys
end

#rename_require_temp_files(filepairs) ⇒ Boolean

Returns Test whether any of the new names are the same as one of the old names, such as would be true for insertion of a new file into a page sequence, or a circular rename. In such a case, return true, indicating that use of intermediate temporary files would be required when updating a copy of an object’s files at a given location.

Parameters:

  • filepairs (Array<Array<String>>)

    The set of oldname, newname pairs for all files being renamed

Returns:

  • (Boolean)

    Test whether any of the new names are the same as one of the old names, such as would be true for insertion of a new file into a page sequence, or a circular rename. In such a case, return true, indicating that use of intermediate temporary files would be required when updating a copy of an object’s files at a given location.



361
362
363
364
365
366
367
368
369
370
371
372
# File 'lib/moab/file_group_difference.rb', line 361

def rename_require_temp_files(filepairs)
  # Split the filepairs into two arrays
  oldnames = []
  newnames = []
  filepairs.each do |old,new|
    oldnames << old
    newnames << new
  end
  # Are any of the filenames the same in set of oldnames and set of newnames?
  intersection = oldnames & newnames
  intersection.count > 0
end

#rename_tempfile_triplets(filepairs) ⇒ Array<Array<String>>

Returns a set of file triples containing oldname, tempname, newname.

Parameters:

  • filepairs (Array<Array<String>>)

    The set of oldname, newname pairs for all files being renamed

Returns:

  • (Array<Array<String>>)

    a set of file triples containing oldname, tempname, newname



376
377
378
# File 'lib/moab/file_group_difference.rb', line 376

def rename_tempfile_triplets(filepairs)
  filepairs.collect{|old,new| [old, new, "#{new}-#{Time.now.strftime('%Y%m%d%H%H%S')}-tmp"]}
end

#subset(change) ⇒ FileGroupDifferenceSubset

Returns Find a specified subset of changes.

Parameters:

  • change (String)

    the change type to search for

Returns:



51
52
53
# File 'lib/moab/file_group_difference.rb', line 51

def subset(change)
  @subset_hash[change.to_sym]
end

#summaryFileGroupDifference

Returns Clone just this element for inclusion in a versionMetadata structure.

Returns:

  • (FileGroupDifference)

    Clone just this element for inclusion in a versionMetadata structure



149
150
151
152
153
154
155
156
157
158
159
160
# File 'lib/moab/file_group_difference.rb', line 149

def summary()
  FileGroupDifference.new(
      :group_id => group_id,
      :identical => identical,
      :copyadded => copyadded,
      :copydeleted => copydeleted,
      :renamed => renamed,
      :modified => modified,
      :added => added,
      :deleted => deleted
  )
end

#summary_fieldsArray<String>

Returns The data fields to include in summary reports.

Returns:

  • (Array<String>)

    The data fields to include in summary reports



142
143
144
# File 'lib/moab/file_group_difference.rb', line 142

def summary_fields
  %w{group_id difference_count identical copyadded copydeleted renamed modified deleted added}
end

#tabulate_added_files(basis_path_hash, other_path_hash) ⇒ FileGroupDifference

Returns Container for reporting the set of file-level differences of type ‘added’.

Parameters:

  • basis_path_hash (OrderedHash<String,FileSignature>)

    The file paths and associated signatures for manifestations appearing only in the basis group

  • other_path_hash (OrderedHash<String,FileSignature>)

    The file paths and associated signatures for manifestations appearing only in the other group

Returns:

  • (FileGroupDifference)

    Container for reporting the set of file-level differences of type ‘added’



308
309
310
311
312
313
314
315
316
317
# File 'lib/moab/file_group_difference.rb', line 308

def tabulate_added_files(basis_path_hash, other_path_hash)
  other_only_keys(basis_path_hash, other_path_hash).each do |path|
    fid = FileInstanceDifference.new(:change => 'added')
    fid.basis_path = ""
    fid.other_path = path
    fid.signatures << other_path_hash[path]
    @subset_hash[:added].files << fid
  end
  self
end

#tabulate_deleted_files(basis_path_hash, other_path_hash) ⇒ FileGroupDifference

Returns Container for reporting the set of file-level differences of type ‘deleted’.

Parameters:

  • basis_path_hash (OrderedHash<String,FileSignature>)

    The file paths and associated signatures for manifestations appearing only in the basis group

  • other_path_hash (OrderedHash<String,FileSignature>)

    The file paths and associated signatures for manifestations appearing only in the other group

Returns:

  • (FileGroupDifference)

    Container for reporting the set of file-level differences of type ‘deleted’



326
327
328
329
330
331
332
333
334
335
# File 'lib/moab/file_group_difference.rb', line 326

def tabulate_deleted_files(basis_path_hash, other_path_hash)
  basis_only_keys(basis_path_hash, other_path_hash).each do |path|
    fid = FileInstanceDifference.new(:change => 'deleted')
    fid.basis_path = path
    fid.other_path = ""
    fid.signatures << basis_path_hash[path]
    @subset_hash[:deleted].files << fid
  end
  self
end

#tabulate_modified_files(basis_path_hash, other_path_hash) ⇒ FileGroupDifference

Returns Container for reporting the set of file-level differences of type ‘modified’.

Parameters:

  • basis_path_hash (OrderedHash<String,FileSignature>)

    The file paths and associated signatures for manifestations appearing only in the basis group

  • other_path_hash (OrderedHash<String,FileSignature>)

    The file paths and associated signatures for manifestations appearing only in the other group

Returns:

  • (FileGroupDifference)

    Container for reporting the set of file-level differences of type ‘modified’



289
290
291
292
293
294
295
296
297
298
299
# File 'lib/moab/file_group_difference.rb', line 289

def tabulate_modified_files(basis_path_hash, other_path_hash)
  matching_keys(basis_path_hash, other_path_hash).each do |path|
    fid = FileInstanceDifference.new(:change => 'modified')
    fid.basis_path = path
    fid.other_path = "same"
    fid.signatures << basis_path_hash[path]
    fid.signatures << other_path_hash[path]
    @subset_hash[:modified].files << fid
  end
  self
end

#tabulate_renamed_files(matching_signatures, basis_signature_hash, other_signature_hash) ⇒ FileGroupDifference

Returns Container for reporting the set of file-level differences of type ‘renamed’,‘copyadded’, or ‘copydeleted’.

Parameters:

Returns:

  • (FileGroupDifference)

    Container for reporting the set of file-level differences of type ‘renamed’,‘copyadded’, or ‘copydeleted’



254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
# File 'lib/moab/file_group_difference.rb', line 254

def tabulate_renamed_files(matching_signatures, basis_signature_hash, other_signature_hash)
  matching_signatures.each do |signature|
    basis_paths = basis_signature_hash[signature].paths
    other_paths = other_signature_hash[signature].paths
    basis_only_paths = basis_paths - other_paths
    other_only_paths = other_paths - basis_paths
    maxsize = [basis_only_paths.size, other_only_paths.size].max
    (0..maxsize-1).each do |n|
      fid = FileInstanceDifference.new()
      fid.basis_path = basis_only_paths[n]
      fid.other_path = other_only_paths[n]
      fid.signatures << signature
      case true
        when fid.basis_path.nil?
          fid.change = 'copyadded'
          fid.basis_path = basis_paths[0]
        when fid.other_path.nil?
          fid.change = 'copydeleted'
        else
          fid.change = 'renamed'
      end
      @subset_hash[fid.change.to_sym].files << fid
    end
  end
  self
end

#tabulate_unchanged_files(matching_signatures, basis_signature_hash, other_signature_hash) ⇒ FileGroupDifference

Returns Container for reporting the set of file-level differences of type ‘identical’.

Parameters:

Returns:

  • (FileGroupDifference)

    Container for reporting the set of file-level differences of type ‘identical’



229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
# File 'lib/moab/file_group_difference.rb', line 229

def tabulate_unchanged_files(matching_signatures, basis_signature_hash, other_signature_hash)
  matching_signatures.each do |signature|
    basis_paths = basis_signature_hash[signature].paths
    other_paths = other_signature_hash[signature].paths
    matching_paths = basis_paths & other_paths
    matching_paths.each do |path|
      fid = FileInstanceDifference.new(:change => 'identical')
      fid.basis_path = path
      fid.other_path = "same"
      fid.signatures << signature
      @subset_hash[:identical].files << fid
    end
  end
  self
end