Class: Cabriolet::Huffman::Encoder
- Inherits:
-
Object
- Object
- Cabriolet::Huffman::Encoder
- Defined in:
- lib/cabriolet/huffman/encoder.rb
Overview
Encoder encodes symbols using Huffman codes for compression
Constant Summary collapse
- MAX_BITS =
Maximum code length supported
16
Class Method Summary collapse
-
.build_codes(lengths, num_symbols) ⇒ Hash
Build Huffman codes from code lengths (RFC 1951 algorithm).
-
.build_fixed_codes ⇒ Hash
Build fixed Huffman codes for DEFLATE (RFC 1951).
-
.encode_symbol(symbol, codes, bitstream) ⇒ void
Encode a symbol using Huffman codes and write to bitstream.
-
.reverse_bits(value, num_bits) ⇒ Integer
Reverse bits for writing (some formats need reversed bit order).
Class Method Details
.build_codes(lengths, num_symbols) ⇒ Hash
Build Huffman codes from code lengths (RFC 1951 algorithm)
This generates the actual Huffman code values from code lengths. The algorithm ensures canonical Huffman codes where codes of the same length are assigned sequentially.
19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 |
# File 'lib/cabriolet/huffman/encoder.rb', line 19 def self.build_codes(lengths, num_symbols) # Count the number of codes for each length bl_count = Array.new(MAX_BITS + 1, 0) lengths[0, num_symbols].each do |len| bl_count[len] += 1 if len.positive? end # Find the numerical value of the smallest code for each length code = 0 bl_count[0] = 0 next_code = Array.new(MAX_BITS + 1, 0) (1..MAX_BITS).each do |bits| code = (code + bl_count[bits - 1]) << 1 next_code[bits] = code end # Assign codes to symbols codes = {} num_symbols.times do |symbol| len = lengths[symbol] next unless len.positive? codes[symbol] = { code: next_code[len], bits: len, } next_code[len] += 1 end codes end |
.build_fixed_codes ⇒ Hash
Build fixed Huffman codes for DEFLATE (RFC 1951)
55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 |
# File 'lib/cabriolet/huffman/encoder.rb', line 55 def self.build_fixed_codes # Fixed literal/length code lengths literal_lengths = Array.new(288, 0) (0...144).each { |i| literal_lengths[i] = 8 } (144...256).each { |i| literal_lengths[i] = 9 } (256...280).each { |i| literal_lengths[i] = 7 } (280...288).each { |i| literal_lengths[i] = 8 } # Fixed distance code lengths (all 5 bits) distance_lengths = Array.new(32, 5) { literal: build_codes(literal_lengths, 288), distance: build_codes(distance_lengths, 32), } end |
.encode_symbol(symbol, codes, bitstream) ⇒ void
This method returns an undefined value.
Encode a symbol using Huffman codes and write to bitstream
Per RFC 1951 Section 3.1.1, Huffman codes are written LSB-first, so we must reverse the bits before writing to the bitstream.
81 82 83 84 85 86 87 88 89 90 91 |
# File 'lib/cabriolet/huffman/encoder.rb', line 81 def self.encode_symbol(symbol, codes, bitstream) entry = codes[symbol] unless entry raise Cabriolet::CompressionError, "No code for symbol #{symbol}" end # Reverse bits for LSB-first writing per RFC 1951 reversed_code = reverse_bits(entry[:code], entry[:bits]) bitstream.write_bits(reversed_code, entry[:bits]) end |
.reverse_bits(value, num_bits) ⇒ Integer
Reverse bits for writing (some formats need reversed bit order)
98 99 100 101 102 103 104 105 |
# File 'lib/cabriolet/huffman/encoder.rb', line 98 def self.reverse_bits(value, num_bits) result = 0 num_bits.times do result = (result << 1) | (value & 1) value >>= 1 end result end |