Class: ZipTricks::FileReader

Inherits:
Object
  • Object
show all
Defined in:
lib/zip_tricks/file_reader.rb

Overview

A very barebones ZIP file reader. Is made for maximum interoperability, but at the same time we attempt to keep it somewhat concise.

REALLY CRAZY IMPORTANT STUFF: SECURITY IMPLICATIONS

Please BEWARE - using this is a security risk if you are reading files that have been supplied by users. This implementation has not been formally verified for correctness. As ZIP files contain relative offsets in lots of places it might be possible for a maliciously crafted ZIP file to put the decode procedure in an endless loop, make it attempt huge reads from the input file and so on. Additionally, the reader module for deflated data has no support for ZIP bomb protection. So either limit the FileReader usage to the files you trust, or triple-check all the inputs upfront. Patches to make this reader more secure are welcome of course.

Usage

File.open('zipfile.zip', 'rb') do |f|
  entries = FileReader.read_zip_structure(f)
  entries.each do |e|
    File.open(e.filename, 'wb') do |extracted_file|
      ex = e.extractor_from(f)
      extracted_file << ex.extract(1024 * 1024) until ex.eof?
    end
  end
end

Supported features

  • Deflate and stored storage modes
  • Zip64 (extra fields and offsets)
  • Data descriptors

Unsupported features

  • Archives split over multiple disks/files
  • Any ZIP encryption
  • EFS language flag and InfoZIP filename extra field
  • CRC32 checksums are not verified

Mode of operation

Basically, FileReader ignores the data in local file headers (as it is often unreliable). It reads the ZIP file "from the tail", finds the end-of-central-directory signatures, then reads the central directory entries, reconstitutes the entries with their filenames, attributes and so on, and sets these entries up with the absolute offsets into the source file/IO object. These offsets can then be used to extract the actual compressed data of the files and to expand it.

Defined Under Namespace

Classes: ZipEntry

Constant Summary collapse

ReadError =
Class.new(StandardError)
UnsupportedFeature =
Class.new(StandardError)
InvalidStructure =
Class.new(ReadError)

Class Method Summary collapse

Instance Method Summary collapse

Class Method Details

.read_zip_structure(io) ⇒ Array<Entry>

Parse an IO handle to a ZIP archive into an array of Entry objects.

Parameters:

  • io (#tell, #seek, #read, #size)

    an IO-ish object

Returns:

  • (Array<Entry>)

    an array of entries within the ZIP being parsed



228
229
230
# File 'lib/zip_tricks/file_reader.rb', line 228

def self.read_zip_structure(io)
  new.read_zip_structure(io)
end

Instance Method Details

#read_zip_structure(io, read_local_headers: true) ⇒ Array<Entry>

Parse an IO handle to a ZIP archive into an array of Entry objects.

Parameters:

  • io (#tell, #seek, #read, #size)

    an IO-ish object

  • read_local_headers (Boolean) (defaults to: true)

    whether to proceed to read the local headers in addition to the central directory

Returns:

  • (Array<Entry>)

    an array of entries within the ZIP being parsed



193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
# File 'lib/zip_tricks/file_reader.rb', line 193

def read_zip_structure(io, read_local_headers: true)
  zip_file_size = io.size
  eocd_offset = get_eocd_offset(io, zip_file_size)

  zip64_end_of_cdir_location = get_zip64_eocd_location(io, eocd_offset)
  num_files, cdir_location, cdir_size = if zip64_end_of_cdir_location
    num_files_and_central_directory_offset_zip64(io, zip64_end_of_cdir_location)
  else
    num_files_and_central_directory_offset(io, eocd_offset)
  end
  log { 'Located the central directory start at %d' % cdir_location }
  seek(io, cdir_location)

  # Read the entire central directory in one fell swoop
  central_directory_str = read_n(io, cdir_size)
  central_directory_io = StringIO.new(central_directory_str)
  log { 'Read %d bytes with central directory entries' % cdir_size }

  entries = (0...num_files).map do |entry_n|
    log { 'Reading the central directory entry %d starting at offset %d' % [entry_n, cdir_location + central_directory_io.tell] }
    read_cdir_entry(central_directory_io)
  end
  
  entries.each_with_index do |entry, i|
    if read_local_headers
      log { 'Reading the local header for entry %d at offset %d' % [i, entry.local_file_header_offset] }
      entry.compressed_data_offset = find_compressed_data_start_offset(io, entry.local_file_header_offset)
    end
  end
end