Class: ZipTricks::PathSet

Inherits:

Object

Object
ZipTricks::PathSet

Defined in:: lib/zip_tricks/path_set.rb

Overview

A ZIP archive contains a flat list of entries. These entries can implicitly create directories when the archive is expanded. For example, an entry with the filename of "some folder/file.docx" will make the unarchiving application create a directory called "some folder" automatically, and then deposit the file "file.docx" in that directory. These "implicit" directories can be arbitrarily nested, and create a tree structure of directories. That structure however is implicit as the archive contains a flat list.

This creates opportunities for conflicts. For example, imagine the following structure:

something/ - specifies an empty directory with the name "something"
something - specifies a file, creates a conflict

This can be prevented with filename uniqueness checks. It does get funkier however as the rabbit hole goes down:

dir/subdir/another_subdir/yet_another_subdir/file.bin - declares a file and directories
dir/subdir/another_subdir/yet_another_subdir - declares a file at one of the levels, creates a conflict

The results of this ZIP structure aren't very easy to predict as they depend on the application that opens the archive. For example, BOMArchiveHelper on macOS will expand files as they are declared in the ZIP, but once a conflict occurs it will fail with "error -21". It is not very transparent to the user why unarchiving fails, and it has to - and can reliably - only be prevented when the archive gets created.

Unfortunately that conflicts with another "magical" feature of ZipTricks which automatically "fixes" duplicate filenames - filenames (paths) which have already been added to the archive. This fix is performed by appending (1), then (2) and so forth to the filename so that the conflict is avoided. This is not possible to apply to directories, because when one of the path components is reused in multiple filenames it means those entities should end up in the same directory (subdirectory) once the archive is opened.

Defined Under Namespace

Classes: Conflict, DirectoryClobbersFile, FileClobbersDirectory

Instance Method Summary collapse

#add_directory_path(path) ⇒ void
Adds a directory path to the set of known paths, including all the directories that contain it.
#add_file_path(file_path) ⇒ void
Adds a file path to the set of known paths, including all the directories that contain it.
#clear ⇒ void
Clears the contained sets.
#include?(path_in_archive) ⇒ Boolean
Tells whether a specific full path is already known to the PathSet.
#initialize ⇒ PathSet constructor
A new instance of PathSet.

Constructor Details

#initialize ⇒ `PathSet`

Returns a new instance of PathSet.

# File 'lib/zip_tricks/path_set.rb', line 45

def initialize
  @known_directories = Set.new
  @known_files = Set.new
end

Instance Method Details

#add_directory_path(path) ⇒ `void`

This method returns an undefined value.

Adds a directory path to the set of known paths, including all the directories that contain it. So, calling add_directory_path("dir/dir2/dir3") will add "dir", "dir/dir2", "dir/dir2/dir3".

Parameters:

path (String) —
the path to the directory to add

# File 'lib/zip_tricks/path_set.rb', line 57

def add_directory_path(path)
  path_and_ancestors(path).each do |parent_directory_path|
    if @known_files.include?(parent_directory_path)
      # Have to use the old-fashioned heredocs because ZipTricks
      # aims to be compatible with MRI 2.1+ syntax, and squiggly
      # heredoc is only available starting 2.3+
      error_message = <<ERR
The path #{parent_directory_path.inspect} which has to be added
as a directory is already used for a file.

The directory at this path would get created implicitly
to produce #{path.inspect} during decompresison.

This would make some archive utilities refuse to open
the ZIP.
ERR
      raise DirectoryClobbersFile, error_message
    end
    @known_directories << parent_directory_path
  end
end

#add_file_path(file_path) ⇒ `void`

This method returns an undefined value.

Adds a file path to the set of known paths, including all the directories that contain it. Once a file has been added, it is no longer possible to add a directory having the same path as this would cause conflict.

The operation also adds all the containing directories for the file, so add_file_path("dir/dir2/file.doc") will add "dir" and "dir/dir2" as directories, "dir/dir2/dir3".

Parameters:

file_path (String) —
the path to the directory to add

# File 'lib/zip_tricks/path_set.rb', line 90

def add_file_path(file_path)
  if @known_files.include?(file_path)
    error_message = <<ERR
The file at #{file_path.inspect} has already been included
in the archive. Adding it the second time would cause
the first file to be overwritten during unarchiving, and
could also get the archive flagged as invalid.
ERR
    raise Conflict, error_message
  end

  if @known_directories.include?(file_path)
    error_message = <<ERR
The path #{file_path.inspect} is already used for
a directory, but you are trying to add it as a file.

This would make some archive utilities refuse
to open the ZIP.
ERR
    raise FileClobbersDirectory, error_message
  end

  # Add all the directories which this file is contained in
  *dir_components, _file_name = non_empty_path_components(file_path)
  add_directory_path(dir_components.join('/'))

  # ...and then the file itself
  @known_files << file_path
end

#clear ⇒ `void`

This method returns an undefined value.

Clears the contained sets

# File 'lib/zip_tricks/path_set.rb', line 131

def clear
  @known_files.clear
  @known_directories.clear
end

#include?(path_in_archive) ⇒ `Boolean`

Tells whether a specific full path is already known to the PathSet. Can be a path for a directory or for a file.

Parameters:

path_in_archive (String) —
the path to check for inclusion

Returns:

(Boolean)



125
126
127

# File 'lib/zip_tricks/path_set.rb', line 125

def include?(path_in_archive)
  @known_files.include?(path_in_archive) || @known_directories.include?(path_in_archive)
end

Class: ZipTricks::PathSet

Overview

Defined Under Namespace

Instance Method Summary collapse

Constructor Details

#initialize ⇒ PathSet

Instance Method Details

#add_directory_path(path) ⇒ void

#add_file_path(file_path) ⇒ void

#clear ⇒ void

#include?(path_in_archive) ⇒ Boolean

#initialize ⇒ `PathSet`

#add_directory_path(path) ⇒ `void`

#add_file_path(file_path) ⇒ `void`

#clear ⇒ `void`

#include?(path_in_archive) ⇒ `Boolean`