Introduction
Cabriolet extracts and creates Microsoft Cabinet (.CAB) files and related compression formats using pure Ruby.
This gem fully covers the features of libmspack and cabextract, implementing all Microsoft compression formats for both extraction (decompression) and creation (compression).
|
Note
|
No C extensions required, works on any platform where Ruby runs. |
Features
-
Full format support for all 7 Microsoft compression formats
-
CAB (Microsoft Cabinet)
-
CHM (Compiled HTML Help)
-
SZDD (Single-file LZSS compression)
-
KWAJ (Installation file compression)
-
HLP (Windows Help)
-
LIT (Microsoft Reader eBooks)
-
OAB (Offline Address Book)
-
-
Bidirectional operations (compress and decompress)
-
All compression algorithms
-
None (uncompressed storage)
-
LZSS (4KB sliding window, 3 modes)
-
MSZIP (DEFLATE/RFC 1951)
-
LZX (advanced with Intel E8 preprocessing)
-
Quantum (adaptive arithmetic coding)
-
-
Advanced features
-
Multi-part cabinet sets (spanning, merging)
-
Embedded cabinet search
-
Salvage mode for corrupted files
-
Custom I/O handlers
-
Progress callbacks
-
Checksum verification
-
Metadata preservation (timestamps, attributes)
-
-
Pure Ruby - No compilation needed, works everywhere
-
Comprehensive testing - 914 test examples, 0 failures
-
Complete CLI - 30+ commands for all operations
Architecture
Application Layer (CLI/API)
↓
Format Layer (CAB, CHM, SZDD, KWAJ, HLP, LIT, OAB)
↓
Algorithm Layer (None, LZSS, MSZIP, LZX, Quantum)
↓
Binary I/O Layer (BinData structures, Bitstreams)
↓
System Layer (I/O abstraction, file/memory handles)
For complete architecture, see Architecture Documentation.
Installation
Add to your Gemfile:
gem "cabriolet"
Or install directly:
gem install cabriolet
For detailed installation instructions, see Installation Guide.
System requirements
-
Ruby 2.7 or higher
-
Operating Systems: Linux, macOS, Windows
-
Dependencies: bindata (~> 2.5), thor (~> 1.3)
Usage
Command line interface
CAB (Cabinet) operations
List contents
cabriolet list example.cab
Cabinet: example.cab (Set ID: 12345, Index: 0)
Folders: 1, Files: 2
Files:
README.txt (1,234 bytes)
data.bin (45,678 bytes)
Extract all files
cabriolet extract example.cab
Extract to specific directory
cabriolet extract example.cab --output /path/to/output
Test cabinet integrity
cabriolet test example.cab
Show detailed information
cabriolet info example.cab
Cabinet Information
==================================================
Filename: example.cab
Set ID: 12345
Set Index: 0
Size: 100,000 bytes
Folders: 2
Files: 15
Folders:
[0] MSZIP (5 blocks)
[1] LZX (3 blocks)
Files:
README.txt
Size: 1,234 bytes
Modified: 2024-01-15 10:30:00
Attributes: archive
...
Search for embedded CABs
cabriolet search installer.exe --verbose
Cabinet found at offset 1024
Files: 50, Folders: 1
Cabinet found at offset 524288
Files: 20, Folders: 1
Total: 2 cabinet(s) found
Create CAB file
cabriolet create output.cab file1.txt file2.txt
cabriolet create output.cab *.txt --compression mszip
cabriolet create output.cab files/ --compression lzx
Compression options:
-
none- Uncompressed storage -
lzss- LZSS compression (default for small files) -
mszip- MSZIP/DEFLATE compression (recommended) -
lzx- LZX compression (best ratio, slower) -
quantum- Quantum compression (experimental)
CHM (HTML Help) operations
List CHM contents
cabriolet chm-list help.chm
Extract CHM files
cabriolet chm-extract help.chm output/
Show CHM information
cabriolet chm-info help.chm
Create CHM file
cabriolet chm-create help.chm index.html page1.html page2.html
cabriolet chm-create help.chm docs/*.html --window-bits 16
Options:
-
--window-bits- LZX window size (15-21, default: 16) -
--verbose- Enable verbose output
SZDD operations
Expand SZDD file
cabriolet expand file.tx_
cabriolet expand file.tx_ output.txt
Compress to SZDD
cabriolet compress file.txt
cabriolet compress file.txt --missing-char t
cabriolet compress file.txt --format qbasic
Options:
-
--missing-char- Last character of original filename -
--format- Format type (normalorqbasic)
Show SZDD information
cabriolet szdd-info file.tx_
KWAJ operations
Extract KWAJ file
cabriolet kwaj-extract setup.kwj
cabriolet kwaj-extract setup.kwj output.exe
Compress to KWAJ
cabriolet kwaj-compress file.exe
cabriolet kwaj-compress file.exe --compression szdd --include-length
cabriolet kwaj-compress file.exe --filename original.exe
Compression options:
-
none- Uncompressed -
xor- XOR encryption (0xFF) -
szdd- LZSS compression (default) -
mszip- MSZIP compression
Other options:
-
--include-length- Include uncompressed length in header -
--filename- Embed original filename
Show KWAJ information
cabriolet kwaj-info setup.kwj
HLP (Windows Help) operations
Extract HLP file
cabriolet hlp-extract help.hlp output/
Create HLP file
cabriolet hlp-create output.hlp topic1.txt topic2.txt
Show HLP information
cabriolet hlp-info help.hlp
LIT (eBook) operations
Extract LIT file
cabriolet lit-extract book.lit output/
|
Note
|
DES-encrypted (DRM-protected) LIT files are not supported. For encrypted files, use Microsoft Reader or convert to another format first. |
Create LIT file
cabriolet lit-create book.lit chapter1.html chapter2.html
Show LIT information
cabriolet lit-info book.lit
OAB (Address Book) operations
Extract OAB file
cabriolet oab-extract contacts.lzx output.oab
cabriolet oab-extract patch.lzx output.oab --base contacts.oab
Options:
-
--base- Base file for incremental patch application
Create OAB file
cabriolet oab-create contacts.oab output.lzx
cabriolet oab-create new.oab patch.lzx --base old.oab
Options:
-
--base- Create incremental patch -
--block-size- LZX block size (default: 32768)
Show OAB information
cabriolet oab-info contacts.lzx
Global Options
All commands support:
-
--verbose, -v- Enable verbose output -
--help, -h- Show command help
Ruby API
CAB operations
Basic extraction
require "cabriolet"
# Open and extract
decompressor = Cabriolet::CAB::Decompressor.new
cabinet = decompressor.open("example.cab")
# List files
cabinet.files.each do |file|
puts "#{file.filename}: #{file.length} bytes"
end
# Extract single file
file = cabinet.files.first
decompressor.extract_file(file, "output.txt")
# Extract all files
decompressor.extract_all(cabinet, "output/")
Advanced extraction options
decompressor = Cabriolet::CAB::Decompressor.new
decompressor.salvage = true # Enable salvage mode
decompressor.fix_mszip = true # Enable MSZIP error recovery
decompressor.buffer_size = 8192 # Set buffer size
cabinet = decompressor.open("example.cab")
decompressor.extract_all(cabinet, "output/")
Multi-part cabinets
decompressor = Cabriolet::CAB::Decompressor.new
# Open first cabinet
cab1 = decompressor.open("disk1.cab")
# Open and append subsequent parts
cab2 = decompressor.open("disk2.cab")
decompressor.append(cab1, cab2)
cab3 = decompressor.open("disk3.cab")
decompressor.append(cab2, cab3)
# Extract from merged cabinet set
decompressor.extract_all(cab1, "output/")
Search for embedded cabinets
decompressor = Cabriolet::CAB::Decompressor.new
cabinet = decompressor.search("installer.exe")
while cabinet
puts "Cabinet at offset #{cabinet.base_offset}"
puts " Files: #{cabinet.file_count}"
# Extract this cabinet
decompressor.extract_all(cabinet, "output_#{cabinet.base_offset}/")
# Move to next found cabinet
cabinet = cabinet.next
end
Create CAB file
compressor = Cabriolet::CAB::Compressor.new
# Add files
compressor.add_file("README.txt")
compressor.add_file("data.bin", "custom/path.bin")
# Generate cabinet
bytes = compressor.generate("output.cab",
compression: :mszip,
set_id: 12345,
cabinet_index: 0)
puts "Created output.cab (#{bytes} bytes)"
Compression options:
-
:none- No compression -
:lzss- LZSS compression -
:mszip- MSZIP/DEFLATE compression (recommended) -
:lzx- LZX compression (best ratio) -
:quantum- Quantum compression (experimental)
CHM operations
Extract CHM files
decompressor = Cabriolet::CHM::Decompressor.new
chm = decompressor.open("help.chm")
# List files
chm.files&.each do |file|
puts file.filename
end
# Extract single file
file = chm.files.first
decompressor.extract(file, "output.html") if file
# Extract all files
chm.files&.each do |file|
output_path = File.join("output", file.filename)
FileUtils.mkdir_p(File.dirname(output_path))
decompressor.extract(file, output_path)
end
Fast CHM parsing
decompressor = Cabriolet::CHM::Decompressor.new
# Quick open (headers only, no file enumeration)
chm = decompressor.fast_open("help.chm")
# Find specific file quickly
file = Models::CHMFile.new
result = decompressor.fast_find(chm, "/index.html", file)
if file.length > 0
decompressor.extract(file, "index.html")
end
Create CHM file
compressor = Cabriolet::CHM::Compressor.new
# Add files
compressor.add_file("index.html", "/index.html", section: :compressed)
compressor.add_file("image.png", "/images/image.png", section: :uncompressed)
# Generate CHM
bytes = compressor.generate("help.chm",
window_bits: 16,
language_id: 0x0409)
puts "Created help.chm (#{bytes} bytes)"
Options:
-
window_bits- LZX window size (15-21, default: 16) -
language_id- Language identifier (default: 0x0409 for English US) -
timestamp- Custom timestamp (default: current time)
SZDD operations
Expand SZDD file
decompressor = Cabriolet::SZDD::Decompressor.new
# Open and get header
header = decompressor.open("file.tx_")
puts "Format: #{header.format_name}"
puts "Length: #{header.length} bytes"
puts "Missing char: #{header.missing_char}" if header.missing_char
# Extract
decompressor.extract(header, "file.txt")
# Or one-shot
decompressor.decompress("file.tx_", "file.txt")
Compress to SZDD
compressor = Cabriolet::SZDD::Compressor.new
# Compress file
bytes = compressor.compress("file.txt", "file.tx_",
missing_char: "t",
format: :normal)
# Or compress data from memory
bytes = compressor.compress_data("Hello, world!", "output.tx_")
Format options:
-
:normal- Standard SZDD format (MS-DOS compatible) -
:qbasic- QBasic SZDD format
KWAJ operations
Extract KWAJ file
decompressor = Cabriolet::KWAJ::Decompressor.new
# Open and get header
header = decompressor.open("setup.kwj")
puts "Compression: #{header.compression_name}"
puts "Length: #{header.length} bytes" if header.length
puts "Filename: #{header.filename}" if header.filename
# Extract
decompressor.extract(header, "setup.kwj", "output.exe")
# Or one-shot
decompressor.decompress("setup.kwj", "setup.exe")
Compress to KWAJ
compressor = Cabriolet::KWAJ::Compressor.new
# Compress file
bytes = compressor.compress("file.exe", "file.kwj",
compression: :szdd,
include_length: true,
filename: "original.exe")
# Compression options: :none, :xor, :szdd, :mszip
HLP (Windows Help) operations
Extract HLP file
decompressor = Cabriolet::HLP::Decompressor.new
hlp = decompressor.open("help.hlp")
# Extract files
hlp.files.each do |file|
decompressor.extract_file(file, "output/#{file.filename}")
end
Create HLP file
compressor = Cabriolet::HLP::Compressor.new
# Add files
compressor.add_file("topic1.txt", "topic1")
compressor.add_file("topic2.txt", "topic2")
# Generate HLP
bytes = compressor.generate("help.hlp")
|
Note
|
HLP format has no public specification. Implementation is based on libmspack source code. |
LIT (eBook) operations
Extract LIT file
decompressor = Cabriolet::LIT::Decompressor.new
begin
lit = decompressor.open("book.lit")
if lit.encrypted
raise "LIT file is DRM-encrypted. Decryption not supported."
end
# Extract files
lit.files.each do |file|
decompressor.extract_file(file, "output/#{file.filename}")
end
rescue NotImplementedError => e
puts "Error: #{e.}"
end
Create LIT file
compressor = Cabriolet::LIT::Compressor.new
compressor.add_file("content.html", "/content.html")
bytes = compressor.generate("book.lit")
Limitations:
-
DES encryption (DRM) is intentionally not supported
-
For encrypted LIT files, decrypt with Microsoft Reader first
OAB (Offline Address Book) operations
Extract OAB file
decompressor = Cabriolet::OAB::Decompressor.new
# Extract full file
decompressor.decompress("contacts.lzx", "contacts.oab")
# Apply incremental patch
decompressor.decompress_incremental("patch.lzx", "base.oab", "new.oab")
Create OAB file
compressor = Cabriolet::OAB::Compressor.new
# Compress full file
compressor.compress("contacts.oab", "contacts.lzx")
# Create incremental patch
compressor.compress_incremental("new.oab", "old.oab", "patch.lzx")
Custom I/O Handlers
In-memory operations
# Create custom I/O system
memory_io = Cabriolet::System::IOSystem.new
# Process entirely in memory
decompressor = Cabriolet::CAB::Decompressor.new(memory_io)
# Load CAB data
cab_data = File.binread("example.cab")
input = Cabriolet::System::MemoryHandle.new(cab_data)
cabinet = decompressor.parser.parse_handle(input, "example.cab")
# Extract to memory
file = cabinet.files.first
output = Cabriolet::System::MemoryHandle.new("", Cabriolet::Constants::MODE_WRITE)
# ... extract to memory handle
Custom I/O system
Error Handling
Common errors
begin
decompressor = Cabriolet::CAB::Decompressor.new
cabinet = decompressor.open("example.cab")
decompressor.extract_all(cabinet, "output/")
rescue Cabriolet::IOError => e
puts "I/O error: #{e.}"
rescue Cabriolet::ParseError => e
puts "Parse error: #{e.}"
rescue Cabriolet::ChecksumError => e
puts "Checksum failed: #{e.}"
rescue Cabriolet::DecompressionError => e
puts "Decompression error: #{e.}"
rescue Cabriolet::Error => e
puts "General error: #{e.}"
end
Salvage mode for corrupted files
decompressor = Cabriolet::CAB::Decompressor.new
decompressor.salvage = true # Enable error recovery
# Will skip bad files and continue
cabinet = decompressor.open("corrupted.cab")
decompressor.extract_all(cabinet, "output/")
Fix MSZIP errors
decompressor = Cabriolet::CAB::Decompressor.new
decompressor.fix_mszip = true # Ignore MSZIP checksums, recover from errors
cabinet = decompressor.open("example.cab")
decompressor.extract_all(cabinet, "output/")
API Reference
Cabriolet::CAB::Decompressor
Main class for CAB file operations.
Class methods
new(io_system = nil)-
Creates a new decompressor instance.
- Parameters
io_system-
Optional custom I/O system implementation
- Returns
Cabriolet::CAB::Decompressor-
New decompressor instance
Instance methods
open(filename)-
Opens and parses a CAB file.
- Parameters
filename-
Path to CAB file
- Returns
Cabriolet::Models::Cabinet-
Parsed cabinet object
- Raises
Cabriolet::ParseError-
If file is not valid CAB format
Cabriolet::IOError-
If file cannot be opened
extract_file(file, output_path, **options)-
Extracts a single file from the cabinet.
- Parameters
file-
Cabriolet::Models::Fileobject output_path-
Where to write the file
options-
Optional hash (salvage, overwrite, etc.)
- Returns
Integer-
Number of bytes extracted
extract_all(cabinet, output_dir, **options)-
Extracts all files from the cabinet.
- Parameters
cabinet-
Cabriolet::Models::Cabinetobject output_dir-
Directory to extract to
options-
Optional hash
- Returns
Integer-
Number of files extracted
search(filename)-
Searches for embedded cabinets in a file.
- Parameters
filename-
File to search
- Returns
Cabriolet::Models::Cabinet-
First found cabinet (use
.nextfor others) nil-
If no cabinets found
append(cabinet, next_cabinet)-
Merges two cabinets in a multi-part set.
- Parameters
cabinet-
First cabinet
next_cabinet-
Next cabinet in sequence
- Returns
-
void
Attributes
buffer_size-
I/O buffer size in bytes (default: 4096)
salvage-
Enable salvage mode for corrupted files (default: false)
fix_mszip-
Enable MSZIP error recovery (default: false)
Cabriolet::CAB::Compressor
Class for creating CAB files.
Instance methods
add_file(source_path, cab_path = nil)-
Adds a file to the cabinet.
- Parameters
source_path-
Path to source file
cab_path-
Path within cabinet (optional, defaults to basename)
generate(output_file, **options)-
Generates the cabinet file.
- Parameters
output_file-
Path to output CAB file
options-
Hash with compression, set_id, etc.
- Returns
Integer-
Bytes written
Example:
compressor = Cabriolet::CAB::Compressor.new
compressor.add_file("file1.txt")
compressor.add_file("file2.txt")
bytes = compressor.generate("output.cab", compression: :mszip)
Compression Algorithm Status
| Algorithm | Decompression | Compression | Notes |
|---|---|---|---|
None |
✅ Working |
✅ Working |
Uncompressed storage |
LZSS |
✅ Working |
✅ Working |
4KB sliding window, 3 modes (EXPAND, MSHELP, QBASIC) |
MSZIP |
✅ Working |
✅ Working |
DEFLATE/RFC 1951, fixed Huffman |
LZX |
✅ Working |
✅ Working |
UNCOMPRESSED blocks, 32KB-2MB window |
Quantum |
✅ Working |
⚠️ Functional |
Literals + short matches work. Complex patterns pending. |
Configuration Options
Buffer Sizes
# Set default buffer size globally
Cabriolet.default_buffer_size = 8192
# Or per decompressor
decompressor.buffer_size = 16384
Compression Algorithm Selection Guide
| Algorithm | Ratio | Speed | Complexity | Use Case |
|---|---|---|---|---|
None |
1:1 |
Fastest |
Trivial |
Already compressed data, testing |
LZSS |
2-3:1 |
Fast |
Low |
Small files, compatibility |
MSZIP |
3-5:1 |
Medium |
Medium |
Recommended for most uses |
LZX |
5-10:1 |
Slow |
High |
Large files, best compression |
Quantum |
4-8:1 |
Medium |
Very High |
Experimental, use with caution |
Return values
All methods return appropriate values or raise exceptions:
-
Decompression methods: Return bytes extracted or raise error
-
Compression methods: Return bytes written or raise error
-
Parse methods: Return model objects or raise
ParseError -
File operations: Return file handles or raise
IOError
Development
Building from source
git clone https://github.com/omnizip/cabriolet.git
cd cabriolet
bundle install
bundle exec rake
Running tests
bundle exec rspec
Running RuboCop
bundle exec rubocop
bundle exec rubocop -A # Auto-correct
Known limitations
Quantum compression
Quantum compression is functional but experimental:
-
✅ Decompression: Fully working, production ready
-
✅ Compression: Working for:
-
Simple literals
-
Short matches (3-4 bytes)
-
Basic patterns
-
-
⚠️ Limitations:
-
Complex repeated patterns may fail
-
Very long matches (14+ bytes) have encoding issues
-
Recommended: Use LZSS, MSZIP, or LZX instead
-
LIT Format
-
DES encryption (DRM) intentionally not supported
-
For DRM-protected LIT files, decrypt with Microsoft Reader first
HLP/LIT/OAB Formats
-
No public format specifications available
-
Implementation based on libmspack source code
-
Cannot be fully validated without real test files
-
Basic functionality working, edge cases may exist
Troubleshooting
Extraction failures
- Problem
-
Invalid CAB signature
- Solution
-
File may not be a CAB, or is corrupted. Try salvage mode:
cabriolet extract --salvage corrupted.cab
- Problem
-
Checksum mismatch
- Solution
-
Enable error recovery:
decompressor.fix_mszip = true
decompressor.salvage = true
Performance issues
- Problem
-
Slow extraction
- Solution
-
Increase buffer size:
decompressor.buffer_size = 16384
Specifications
Acknowledgments
A special thank you to Stuart Caie (aka Kyzer) who created the original libmspack and cabextract projects, and their contributors for:
-
Comprehensive CAB format implementation
-
Excellent test coverage and test fixtures
-
Clear format documentation
Link to the libmspack/cabextract project: https://www.cabextract.org.uk/libmspack/
Cabriolet is inspired by and builds upon the foundation laid by these projects.
If performance is critical, Cabriolet is not the best choice. Consider using libmspack via FFI for optimized speed.
License
BSD 3-Clause License. See LICENSE file for details.
Some test fixtures are from third-party projects. Test fixtures are NOT distributed with the gem and are only used for development and testing purposes.
These fixtures are sourced from the respective projects and retain their original licenses:
-
Test fixtures in
spec/fixtures/libmspack/are from the libmspack project (LGPL 2.1). -
Test fixtures in
spec/fixtures/cabextract/are from cabextract (GPL 2.0+).
See fixture directories for individual attribution files.