RubyGems Version License

Pure Ruby implementation for extracting and creating Microsoft compression format files.

Introduction

Cabriolet extracts and creates Microsoft Cabinet (.CAB) files and related compression formats using pure Ruby.

This gem fully covers the features of libmspack and cabextract, implementing all Microsoft compression formats for both extraction (decompression) and creation (compression).

Note
No C extensions required, works on any platform where Ruby runs.

Features

  • Full format support for all 7 Microsoft compression formats

    • CAB (Microsoft Cabinet)

    • CHM (Compiled HTML Help)

    • SZDD (Single-file LZSS compression)

    • KWAJ (Installation file compression)

    • HLP (Windows Help)

    • LIT (Microsoft Reader eBooks)

    • OAB (Offline Address Book)

  • Bidirectional operations (compress and decompress)

  • All compression algorithms

    • None (uncompressed storage)

    • LZSS (4KB sliding window, 3 modes)

    • MSZIP (DEFLATE/RFC 1951)

    • LZX (advanced with Intel E8 preprocessing)

    • Quantum (adaptive arithmetic coding)

  • Advanced features

    • Multi-part cabinet sets (spanning, merging)

    • Embedded cabinet search

    • Salvage mode for corrupted files

    • Custom I/O handlers

    • Progress callbacks

    • Checksum verification

    • Metadata preservation (timestamps, attributes)

  • Pure Ruby - No compilation needed, works everywhere

  • Comprehensive testing - 914 test examples, 0 failures

  • Complete CLI - 30+ commands for all operations

Architecture

High-level architecture
Application Layer (CLI/API)
         ↓
  Format Layer (CAB, CHM, SZDD, KWAJ, HLP, LIT, OAB)
         ↓
  Algorithm Layer (None, LZSS, MSZIP, LZX, Quantum)
         ↓
  Binary I/O Layer (BinData structures, Bitstreams)
         ↓
  System Layer (I/O abstraction, file/memory handles)

For complete architecture, see Architecture Documentation.

Installation

Add to your Gemfile:

gem "cabriolet"

Or install directly:

gem install cabriolet

For detailed installation instructions, see Installation Guide.

System requirements

  • Ruby 2.7 or higher

  • Operating Systems: Linux, macOS, Windows

  • Dependencies: bindata (~> 2.5), thor (~> 1.3)

Usage

Command line interface

CAB (Cabinet) operations

List contents
cabriolet list example.cab
Example 1. Example output
Cabinet: example.cab (Set ID: 12345, Index: 0)
Folders: 1, Files: 2
Files:
  README.txt (1,234 bytes)
  data.bin (45,678 bytes)
Extract all files
cabriolet extract example.cab
Extract to specific directory
cabriolet extract example.cab --output /path/to/output
Test cabinet integrity
cabriolet test example.cab
Show detailed information
cabriolet info example.cab
Example 2. Example output
Cabinet Information
==================================================
Filename: example.cab
Set ID: 12345
Set Index: 0
Size: 100,000 bytes
Folders: 2
Files: 15

Folders:
  [0] MSZIP (5 blocks)
  [1] LZX (3 blocks)

Files:
  README.txt
    Size: 1,234 bytes
    Modified: 2024-01-15 10:30:00
    Attributes: archive
  ...
Search for embedded CABs
cabriolet search installer.exe --verbose
Example 3. Example output
Cabinet found at offset 1024
  Files: 50, Folders: 1
Cabinet found at offset 524288
  Files: 20, Folders: 1

Total: 2 cabinet(s) found
Create CAB file
cabriolet create output.cab file1.txt file2.txt
cabriolet create output.cab *.txt --compression mszip
cabriolet create output.cab files/ --compression lzx

Compression options:

  • none - Uncompressed storage

  • lzss - LZSS compression (default for small files)

  • mszip - MSZIP/DEFLATE compression (recommended)

  • lzx - LZX compression (best ratio, slower)

  • quantum - Quantum compression (experimental)

CHM (HTML Help) operations

List CHM contents
cabriolet chm-list help.chm
Extract CHM files
cabriolet chm-extract help.chm output/
Show CHM information
cabriolet chm-info help.chm
Create CHM file
cabriolet chm-create help.chm index.html page1.html page2.html
cabriolet chm-create help.chm docs/*.html --window-bits 16

Options:

  • --window-bits - LZX window size (15-21, default: 16)

  • --verbose - Enable verbose output

SZDD operations

Expand SZDD file
cabriolet expand file.tx_
cabriolet expand file.tx_ output.txt
Compress to SZDD
cabriolet compress file.txt
cabriolet compress file.txt --missing-char t
cabriolet compress file.txt --format qbasic

Options:

  • --missing-char - Last character of original filename

  • --format - Format type (normal or qbasic)

Show SZDD information
cabriolet szdd-info file.tx_

KWAJ operations

Extract KWAJ file
cabriolet kwaj-extract setup.kwj
cabriolet kwaj-extract setup.kwj output.exe
Compress to KWAJ
cabriolet kwaj-compress file.exe
cabriolet kwaj-compress file.exe --compression szdd --include-length
cabriolet kwaj-compress file.exe --filename original.exe

Compression options:

  • none - Uncompressed

  • xor - XOR encryption (0xFF)

  • szdd - LZSS compression (default)

  • mszip - MSZIP compression

Other options:

  • --include-length - Include uncompressed length in header

  • --filename - Embed original filename

Show KWAJ information
cabriolet kwaj-info setup.kwj

HLP (Windows Help) operations

Extract HLP file
cabriolet hlp-extract help.hlp output/
Create HLP file
cabriolet hlp-create output.hlp topic1.txt topic2.txt
Show HLP information
cabriolet hlp-info help.hlp

LIT (eBook) operations

Extract LIT file
cabriolet lit-extract book.lit output/
Note
DES-encrypted (DRM-protected) LIT files are not supported. For encrypted files, use Microsoft Reader or convert to another format first.
Create LIT file
cabriolet lit-create book.lit chapter1.html chapter2.html
Show LIT information
cabriolet lit-info book.lit

OAB (Address Book) operations

Extract OAB file
cabriolet oab-extract contacts.lzx output.oab
cabriolet oab-extract patch.lzx output.oab --base contacts.oab

Options:

  • --base - Base file for incremental patch application

Create OAB file
cabriolet oab-create contacts.oab output.lzx
cabriolet oab-create new.oab patch.lzx --base old.oab

Options:

  • --base - Create incremental patch

  • --block-size - LZX block size (default: 32768)

Show OAB information
cabriolet oab-info contacts.lzx

Global Options

All commands support:

  • --verbose, -v - Enable verbose output

  • --help, -h - Show command help

Ruby API

CAB operations

Basic extraction
require "cabriolet"

# Open and extract
decompressor = Cabriolet::CAB::Decompressor.new
cabinet = decompressor.open("example.cab")

# List files
cabinet.files.each do |file|
  puts "#{file.filename}: #{file.length} bytes"
end

# Extract single file
file = cabinet.files.first
decompressor.extract_file(file, "output.txt")

# Extract all files
decompressor.extract_all(cabinet, "output/")
Advanced extraction options
decompressor = Cabriolet::CAB::Decompressor.new
decompressor.salvage = true  # Enable salvage mode
decompressor.fix_mszip = true  # Enable MSZIP error recovery
decompressor.buffer_size = 8192  # Set buffer size

cabinet = decompressor.open("example.cab")
decompressor.extract_all(cabinet, "output/")
Multi-part cabinets
decompressor = Cabriolet::CAB::Decompressor.new

# Open first cabinet
cab1 = decompressor.open("disk1.cab")

# Open and append subsequent parts
cab2 = decompressor.open("disk2.cab")
decompressor.append(cab1, cab2)

cab3 = decompressor.open("disk3.cab")
decompressor.append(cab2, cab3)

# Extract from merged cabinet set
decompressor.extract_all(cab1, "output/")
Search for embedded cabinets
decompressor = Cabriolet::CAB::Decompressor.new
cabinet = decompressor.search("installer.exe")

while cabinet
  puts "Cabinet at offset #{cabinet.base_offset}"
  puts "  Files: #{cabinet.file_count}"

  # Extract this cabinet
  decompressor.extract_all(cabinet, "output_#{cabinet.base_offset}/")

  # Move to next found cabinet
  cabinet = cabinet.next
end
Create CAB file
compressor = Cabriolet::CAB::Compressor.new

# Add files
compressor.add_file("README.txt")
compressor.add_file("data.bin", "custom/path.bin")

# Generate cabinet
bytes = compressor.generate("output.cab",
  compression: :mszip,
  set_id: 12345,
  cabinet_index: 0)

puts "Created output.cab (#{bytes} bytes)"

Compression options:

  • :none - No compression

  • :lzss - LZSS compression

  • :mszip - MSZIP/DEFLATE compression (recommended)

  • :lzx - LZX compression (best ratio)

  • :quantum - Quantum compression (experimental)

CHM operations

Extract CHM files
decompressor = Cabriolet::CHM::Decompressor.new
chm = decompressor.open("help.chm")

# List files
chm.files&.each do |file|
  puts file.filename
end

# Extract single file
file = chm.files.first
decompressor.extract(file, "output.html") if file

# Extract all files
chm.files&.each do |file|
  output_path = File.join("output", file.filename)
  FileUtils.mkdir_p(File.dirname(output_path))
  decompressor.extract(file, output_path)
end
Fast CHM parsing
decompressor = Cabriolet::CHM::Decompressor.new

# Quick open (headers only, no file enumeration)
chm = decompressor.fast_open("help.chm")

# Find specific file quickly
file = Models::CHMFile.new
result = decompressor.fast_find(chm, "/index.html", file)

if file.length > 0
  decompressor.extract(file, "index.html")
end
Create CHM file
compressor = Cabriolet::CHM::Compressor.new

# Add files
compressor.add_file("index.html", "/index.html", section: :compressed)
compressor.add_file("image.png", "/images/image.png", section: :uncompressed)

# Generate CHM
bytes = compressor.generate("help.chm",
  window_bits: 16,
  language_id: 0x0409)

puts "Created help.chm (#{bytes} bytes)"

Options:

  • window_bits - LZX window size (15-21, default: 16)

  • language_id - Language identifier (default: 0x0409 for English US)

  • timestamp - Custom timestamp (default: current time)

SZDD operations

Expand SZDD file
decompressor = Cabriolet::SZDD::Decompressor.new

# Open and get header
header = decompressor.open("file.tx_")

puts "Format: #{header.format_name}"
puts "Length: #{header.length} bytes"
puts "Missing char: #{header.missing_char}" if header.missing_char

# Extract
decompressor.extract(header, "file.txt")

# Or one-shot
decompressor.decompress("file.tx_", "file.txt")
Compress to SZDD
compressor = Cabriolet::SZDD::Compressor.new

# Compress file
bytes = compressor.compress("file.txt", "file.tx_",
  missing_char: "t",
  format: :normal)

# Or compress data from memory
bytes = compressor.compress_data("Hello, world!", "output.tx_")

Format options:

  • :normal - Standard SZDD format (MS-DOS compatible)

  • :qbasic - QBasic SZDD format

KWAJ operations

Extract KWAJ file
decompressor = Cabriolet::KWAJ::Decompressor.new

# Open and get header
header = decompressor.open("setup.kwj")

puts "Compression: #{header.compression_name}"
puts "Length: #{header.length} bytes" if header.length
puts "Filename: #{header.filename}" if header.filename

# Extract
decompressor.extract(header, "setup.kwj", "output.exe")

# Or one-shot
decompressor.decompress("setup.kwj", "setup.exe")
Compress to KWAJ
compressor = Cabriolet::KWAJ::Compressor.new

# Compress file
bytes = compressor.compress("file.exe", "file.kwj",
  compression: :szdd,
  include_length: true,
  filename: "original.exe")

# Compression options: :none, :xor, :szdd, :mszip

HLP (Windows Help) operations

Extract HLP file
decompressor = Cabriolet::HLP::Decompressor.new
hlp = decompressor.open("help.hlp")

# Extract files
hlp.files.each do |file|
  decompressor.extract_file(file, "output/#{file.filename}")
end
Create HLP file
compressor = Cabriolet::HLP::Compressor.new

# Add files
compressor.add_file("topic1.txt", "topic1")
compressor.add_file("topic2.txt", "topic2")

# Generate HLP
bytes = compressor.generate("help.hlp")
Note
HLP format has no public specification. Implementation is based on libmspack source code.

LIT (eBook) operations

Extract LIT file
decompressor = Cabriolet::LIT::Decompressor.new

begin
  lit = decompressor.open("book.lit")

  if lit.encrypted
    raise "LIT file is DRM-encrypted. Decryption not supported."
  end

  # Extract files
  lit.files.each do |file|
    decompressor.extract_file(file, "output/#{file.filename}")
  end
rescue NotImplementedError => e
  puts "Error: #{e.message}"
end
Create LIT file
compressor = Cabriolet::LIT::Compressor.new

compressor.add_file("content.html", "/content.html")
bytes = compressor.generate("book.lit")

Limitations:

  • DES encryption (DRM) is intentionally not supported

  • For encrypted LIT files, decrypt with Microsoft Reader first

OAB (Offline Address Book) operations

Extract OAB file
decompressor = Cabriolet::OAB::Decompressor.new

# Extract full file
decompressor.decompress("contacts.lzx", "contacts.oab")

# Apply incremental patch
decompressor.decompress_incremental("patch.lzx", "base.oab", "new.oab")
Create OAB file
compressor = Cabriolet::OAB::Compressor.new

# Compress full file
compressor.compress("contacts.oab", "contacts.lzx")

# Create incremental patch
compressor.compress_incremental("new.oab", "old.oab", "patch.lzx")

Custom I/O Handlers

In-memory operations

# Create custom I/O system
memory_io = Cabriolet::System::IOSystem.new

# Process entirely in memory
decompressor = Cabriolet::CAB::Decompressor.new(memory_io)

# Load CAB data
cab_data = File.binread("example.cab")
input = Cabriolet::System::MemoryHandle.new(cab_data)
cabinet = decompressor.parser.parse_handle(input, "example.cab")

# Extract to memory
file = cabinet.files.first
output = Cabriolet::System::MemoryHandle.new("", Cabriolet::Constants::MODE_WRITE)
# ... extract to memory handle

Custom I/O system

class CustomIOSystem < Cabriolet::System::IOSystem
  def open(filename, mode)
    # Custom open logic
  end

  def read(handle, bytes)
    # Custom read logic
  end

  # ... implement other methods
end

# Use custom I/O
custom_io = CustomIOSystem.new
decompressor = Cabriolet::CAB::Decompressor.new(custom_io)

Error Handling

Common errors

begin
  decompressor = Cabriolet::CAB::Decompressor.new
  cabinet = decompressor.open("example.cab")
  decompressor.extract_all(cabinet, "output/")
rescue Cabriolet::IOError => e
  puts "I/O error: #{e.message}"
rescue Cabriolet::ParseError => e
  puts "Parse error: #{e.message}"
rescue Cabriolet::ChecksumError => e
  puts "Checksum failed: #{e.message}"
rescue Cabriolet::DecompressionError => e
  puts "Decompression error: #{e.message}"
rescue Cabriolet::Error => e
  puts "General error: #{e.message}"
end

Salvage mode for corrupted files

decompressor = Cabriolet::CAB::Decompressor.new
decompressor.salvage = true  # Enable error recovery

# Will skip bad files and continue
cabinet = decompressor.open("corrupted.cab")
decompressor.extract_all(cabinet, "output/")

Fix MSZIP errors

decompressor = Cabriolet::CAB::Decompressor.new
decompressor.fix_mszip = true  # Ignore MSZIP checksums, recover from errors

cabinet = decompressor.open("example.cab")
decompressor.extract_all(cabinet, "output/")

API Reference

Cabriolet::CAB::Decompressor

Main class for CAB file operations.

Class methods
new(io_system = nil)

Creates a new decompressor instance.

Parameters
io_system

Optional custom I/O system implementation

Returns
Cabriolet::CAB::Decompressor

New decompressor instance

Instance methods
open(filename)

Opens and parses a CAB file.

Parameters
filename

Path to CAB file

Returns
Cabriolet::Models::Cabinet

Parsed cabinet object

Raises
Cabriolet::ParseError

If file is not valid CAB format

Cabriolet::IOError

If file cannot be opened

extract_file(file, output_path, **options)

Extracts a single file from the cabinet.

Parameters
file

Cabriolet::Models::File object

output_path

Where to write the file

options

Optional hash (salvage, overwrite, etc.)

Returns
Integer

Number of bytes extracted

extract_all(cabinet, output_dir, **options)

Extracts all files from the cabinet.

Parameters
cabinet

Cabriolet::Models::Cabinet object

output_dir

Directory to extract to

options

Optional hash

Returns
Integer

Number of files extracted

search(filename)

Searches for embedded cabinets in a file.

Parameters
filename

File to search

Returns
Cabriolet::Models::Cabinet

First found cabinet (use .next for others)

nil

If no cabinets found

append(cabinet, next_cabinet)

Merges two cabinets in a multi-part set.

Parameters
cabinet

First cabinet

next_cabinet

Next cabinet in sequence

Returns

void

Attributes
buffer_size

I/O buffer size in bytes (default: 4096)

salvage

Enable salvage mode for corrupted files (default: false)

fix_mszip

Enable MSZIP error recovery (default: false)

Cabriolet::CAB::Compressor

Class for creating CAB files.

Instance methods
add_file(source_path, cab_path = nil)

Adds a file to the cabinet.

Parameters
source_path

Path to source file

cab_path

Path within cabinet (optional, defaults to basename)

generate(output_file, **options)

Generates the cabinet file.

Parameters
output_file

Path to output CAB file

options

Hash with compression, set_id, etc.

Returns
Integer

Bytes written

Example:

compressor = Cabriolet::CAB::Compressor.new
compressor.add_file("file1.txt")
compressor.add_file("file2.txt")
bytes = compressor.generate("output.cab", compression: :mszip)

Compression Algorithm Status

Algorithm Decompression Compression Notes

None

✅ Working

✅ Working

Uncompressed storage

LZSS

✅ Working

✅ Working

4KB sliding window, 3 modes (EXPAND, MSHELP, QBASIC)

MSZIP

✅ Working

✅ Working

DEFLATE/RFC 1951, fixed Huffman

LZX

✅ Working

✅ Working

UNCOMPRESSED blocks, 32KB-2MB window

Quantum

✅ Working

⚠️ Functional

Literals + short matches work. Complex patterns pending.

Configuration Options

Buffer Sizes

# Set default buffer size globally
Cabriolet.default_buffer_size = 8192

# Or per decompressor
decompressor.buffer_size = 16384

Verbose Output

# Enable verbose output globally
Cabriolet.verbose = true

# Or use --verbose flag in CLI
# cabriolet extract file.cab --verbose

Compression Algorithm Selection Guide

Algorithm Ratio Speed Complexity Use Case

None

1:1

Fastest

Trivial

Already compressed data, testing

LZSS

2-3:1

Fast

Low

Small files, compatibility

MSZIP

3-5:1

Medium

Medium

Recommended for most uses

LZX

5-10:1

Slow

High

Large files, best compression

Quantum

4-8:1

Medium

Very High

Experimental, use with caution

Return values

All methods return appropriate values or raise exceptions:

  • Decompression methods: Return bytes extracted or raise error

  • Compression methods: Return bytes written or raise error

  • Parse methods: Return model objects or raise ParseError

  • File operations: Return file handles or raise IOError

Development

Building from source

git clone https://github.com/omnizip/cabriolet.git
cd cabriolet
bundle install
bundle exec rake

Running tests

bundle exec rspec

Running RuboCop

bundle exec rubocop
bundle exec rubocop -A  # Auto-correct

Known limitations

Quantum compression

Quantum compression is functional but experimental:

  • Decompression: Fully working, production ready

  • Compression: Working for:

    • Simple literals

    • Short matches (3-4 bytes)

    • Basic patterns

  • ⚠️ Limitations:

    • Complex repeated patterns may fail

    • Very long matches (14+ bytes) have encoding issues

    • Recommended: Use LZSS, MSZIP, or LZX instead

LIT Format

  • DES encryption (DRM) intentionally not supported

  • For DRM-protected LIT files, decrypt with Microsoft Reader first

HLP/LIT/OAB Formats

  • No public format specifications available

  • Implementation based on libmspack source code

  • Cannot be fully validated without real test files

  • Basic functionality working, edge cases may exist

Troubleshooting

Extraction failures

Problem

Invalid CAB signature

Solution

File may not be a CAB, or is corrupted. Try salvage mode:

cabriolet extract --salvage corrupted.cab
Problem

Checksum mismatch

Solution

Enable error recovery:

decompressor.fix_mszip = true
decompressor.salvage = true

Performance issues

Problem

Slow extraction

Solution

Increase buffer size:

decompressor.buffer_size = 16384

Acknowledgments

A special thank you to Stuart Caie (aka Kyzer) who created the original libmspack and cabextract projects, and their contributors for:

  • Comprehensive CAB format implementation

  • Excellent test coverage and test fixtures

  • Clear format documentation

Link to the libmspack/cabextract project: https://www.cabextract.org.uk/libmspack/

Cabriolet is inspired by and builds upon the foundation laid by these projects.

If performance is critical, Cabriolet is not the best choice. Consider using libmspack via FFI for optimized speed.

License

BSD 3-Clause License. See LICENSE file for details.

Some test fixtures are from third-party projects. Test fixtures are NOT distributed with the gem and are only used for development and testing purposes.

These fixtures are sourced from the respective projects and retain their original licenses:

  • Test fixtures in spec/fixtures/libmspack/ are from the libmspack project (LGPL 2.1).

  • Test fixtures in spec/fixtures/cabextract/ are from cabextract (GPL 2.0+).

See fixture directories for individual attribution files.