Class: ZipTricks::FileReader
- Inherits:
-
Object
- Object
- ZipTricks::FileReader
- Defined in:
- lib/zip_tricks/file_reader.rb
Overview
A very barebones ZIP file reader. Is made for maximum interoperability, but at the same time we attempt to keep it somewhat concise.
REALLY CRAZY IMPORTANT STUFF: SECURITY IMPLICATIONS
Please BEWARE - using this is a security risk if you are reading files that have been
supplied by users. This implementation has not been formally verified for correctness. As
ZIP files contain relative offsets in lots of places it might be possible for a maliciously
crafted ZIP file to put the decode procedure in an endless loop, make it attempt huge reads
from the input file and so on. Additionally, the reader module for deflated data has
no support for ZIP bomb protection. So either limit the FileReader
usage to the files you
trust, or triple-check all the inputs upfront. Patches to make this reader more secure
are welcome of course.
Usage
File.open('zipfile.zip', 'rb') do |f|
entries = FileReader.read_zip_structure(f)
entries.each do |e|
File.open(e.filename, 'wb') do |extracted_file|
ex = e.extractor_from(f)
extracted_file << ex.extract(1024 * 1024) until ex.eof?
end
end
end
Supported features
- Deflate and stored storage modes
- Zip64 (extra fields and offsets)
- Data descriptors
Unsupported features
- Archives split over multiple disks/files
- Any ZIP encryption
- EFS language flag and InfoZIP filename extra field
- CRC32 checksums are not verified
Mode of operation
Basically, FileReader
ignores the data in local file headers (as it is often unreliable).
It reads the ZIP file "from the tail", finds the end-of-central-directory signatures, then
reads the central directory entries, reconstitutes the entries with their filenames, attributes
and so on, and sets these entries up with the absolute offsets into the source file/IO object.
These offsets can then be used to extract the actual compressed data of the files and to expand it.
Defined Under Namespace
Classes: ZipEntry
Constant Summary collapse
- ReadError =
Class.new(StandardError)
- UnsupportedFeature =
Class.new(StandardError)
- InvalidStructure =
Class.new(ReadError)
Class Method Summary collapse
-
.read_zip_structure(io) ⇒ Array<Entry>
Parse an IO handle to a ZIP archive into an array of Entry objects.
Instance Method Summary collapse
-
#read_zip_structure(io, read_local_headers: true) ⇒ Array<Entry>
Parse an IO handle to a ZIP archive into an array of Entry objects.
Class Method Details
.read_zip_structure(io) ⇒ Array<Entry>
Parse an IO handle to a ZIP archive into an array of Entry objects.
228 229 230 |
# File 'lib/zip_tricks/file_reader.rb', line 228 def self.read_zip_structure(io) new.read_zip_structure(io) end |
Instance Method Details
#read_zip_structure(io, read_local_headers: true) ⇒ Array<Entry>
Parse an IO handle to a ZIP archive into an array of Entry objects.
193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 |
# File 'lib/zip_tricks/file_reader.rb', line 193 def read_zip_structure(io, read_local_headers: true) zip_file_size = io.size eocd_offset = get_eocd_offset(io, zip_file_size) zip64_end_of_cdir_location = get_zip64_eocd_location(io, eocd_offset) num_files, cdir_location, cdir_size = if zip64_end_of_cdir_location num_files_and_central_directory_offset_zip64(io, zip64_end_of_cdir_location) else num_files_and_central_directory_offset(io, eocd_offset) end log { 'Located the central directory start at %d' % cdir_location } seek(io, cdir_location) # Read the entire central directory in one fell swoop central_directory_str = read_n(io, cdir_size) central_directory_io = StringIO.new(central_directory_str) log { 'Read %d bytes with central directory entries' % cdir_size } entries = (0...num_files).map do |entry_n| log { 'Reading the central directory entry %d starting at offset %d' % [entry_n, cdir_location + central_directory_io.tell] } read_cdir_entry(central_directory_io) end entries.each_with_index do |entry, i| if read_local_headers log { 'Reading the local header for entry %d at offset %d' % [i, entry.local_file_header_offset] } entry.compressed_data_offset = find_compressed_data_start_offset(io, entry.local_file_header_offset) end end end |