Class: XZ::Stream

Inherits:
Object
  • Object
show all
Extended by:
Forwardable
Defined in:
lib/xz/stream.rb

Overview

The base class for XZ::StreamReader and XZ::StreamWriter. This is an abstract class that is not meant to be used directly. You can, however, test against this class in kind_of? tests.

XZ::StreamReader and XZ::StreamWriter are IO-like classes that allow you to access XZ-compressed data the same way you access an IO-object, easily allowing to fool other libraries that expect IO objects. The most noticable example for this may be reading and writing XZ-compressed tarballs using the minitar RubyGem; see the README.md file for an example.

Most of IO’s methods are implemented in this class or one of the subclasses. The most notable exception is that it is not possible to seek in XZ archives (#seek and #pos= are not defined). Many methods that are not expressly documented in the RDoc still exist; this class uses Ruby’s Forwardable module to forward them to the underlying IO object.

Stream and its subclasses honour Ruby’s external+internal encoding system just like Ruby’s own IO does. All of what the Ruby docs say about external and internal encodings applies to this class with one important difference. The “external encoding” does not refer to the encoding of the file on the hard disk (this file is always a binary file as it’s compressed data), but to the encoding of the decompressed data inside the compressed file.

As with Ruby’s IO class, instances of this class and its subclasses default their external encoding to Encoding.default_external and their internal encoding to Encoding.default_internal. You can use #set_encoding or pass appropriate arguments to the new method to change these encodings per-instance.

Direct Known Subclasses

StreamReader, StreamWriter

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(delegate_io) ⇒ Stream

Private API only for use by subclasses.



95
96
97
98
99
100
101
102
103
104
105
106
107
108
# File 'lib/xz/stream.rb', line 95

def initialize(delegate_io) # :nodoc:
  @delegate_io = delegate_io
  @lzma_stream = XZ::LibLZMA::LZMAStream.malloc
  XZ::LibLZMA::LZMA_STREAM_INIT(@lzma_stream)

  @finished = false
  @lineno = 0
  @pos = 0
  @external_encoding = Encoding.default_external
  @internal_encoding = Encoding.default_internal
  @transcode_options = {}
  @input_buffer_p  = Fiddle::Pointer.malloc(XZ::CHUNK_SIZE)
  @output_buffer_p = Fiddle::Pointer.malloc(XZ::CHUNK_SIZE)
end

Instance Attribute Details

#external_encodingObject (readonly)

Returns the encoding used inside the compressed data stream. Like IO#external_encoding.



87
88
89
# File 'lib/xz/stream.rb', line 87

def external_encoding
  @external_encoding
end

#internal_encodingObject (readonly)

When compressed data is read, the decompressed data is transcoded from the external_encoding to this encoding. If this encoding is nil, no transcoding happens.



92
93
94
# File 'lib/xz/stream.rb', line 92

def internal_encoding
  @internal_encoding
end

#linenoObject

Like IO#lineno and IO#lineno=.



83
84
85
# File 'lib/xz/stream.rb', line 83

def lineno
  @lineno
end

Instance Method Details

#<<(obj) ⇒ Object

Like IO#<<.



289
290
291
# File 'lib/xz/stream.rb', line 289

def <<(obj)
  write(obj.to_s)
end

#adviseObject

Like IO#advise. No-op, because not meaningful on compressed data.



294
295
296
# File 'lib/xz/stream.rb', line 294

def advise
  nil
end

#closeObject

If not done yet, call #finish. Then close the delegate IO. The latter action is going to cause the delegate IO to flush its buffer. After this method returns, it is guaranteed that all pending data has been flushed to the OS’ kernel.



227
228
229
230
231
# File 'lib/xz/stream.rb', line 227

def close
  finish unless @finished
  @delegate_io.close unless @delegate_io.closed?
  nil
end

#close_readObject

Always raises IOError, because XZ streams can never be duplex.

Raises:

  • (IOError)


234
235
236
# File 'lib/xz/stream.rb', line 234

def close_read
  raise(IOError, "Not a duplex I/O stream")
end

#close_writeObject

Always raises IOError, because XZ streams can never be duplex.

Raises:

  • (IOError)


239
240
241
# File 'lib/xz/stream.rb', line 239

def close_write
  raise(IOError, "Not a duplex I/O stream")
end

#closed?Boolean

True if the delegate IO has been closed.

Returns:

  • (Boolean)


187
188
189
# File 'lib/xz/stream.rb', line 187

def closed?
  @delegate_io.closed?
end

#each(*args) ⇒ Object Also known as: each_line

Like IO#each.



365
366
367
368
369
370
371
# File 'lib/xz/stream.rb', line 365

def each(*args)
  return enum_for __method__ unless block_given?

  while line = gets(*args)
    yield(line)
  end
end

#each_byteObject

Like IO#each_byte.



375
376
377
378
379
380
381
# File 'lib/xz/stream.rb', line 375

def each_byte
  return enum_for __method__ unless block_given?

  while byte = getbyte
    yield(byte)
  end
end

#each_charObject

Like IO#each_char.



384
385
386
387
388
389
390
# File 'lib/xz/stream.rb', line 384

def each_char
  return enum_for __method__ unless block_given?

  while char = getc
    yield(char)
  end
end

#each_codepointObject

Like IO#each_codepoint.



393
394
395
396
397
# File 'lib/xz/stream.rb', line 393

def each_codepoint
  return enum_for __method__ unless block_given?

  each_char{|c| yield(c.ord)}
end

#eofObject

Alias for #eof?



182
183
184
# File 'lib/xz/stream.rb', line 182

def eof
  eof?
end

#eof?Boolean

Overridden in StreamReader to be like IO#eof?. This abstract implementation only raises IOError.

Returns:

  • (Boolean)

Raises:

  • (IOError)


177
178
179
# File 'lib/xz/stream.rb', line 177

def eof?
  raise(IOError, "Stream not opened for reading")
end

#finishObject

Free internal libzlma memory. This needs to be called before you leave this object for the GC. If you used a block-form initializer, this done automatically for you.

Subsequent calls to #read or #write will cause an IOError.

Returns the underlying IO object. This allows you to retrieve the File instance that was automatically created when using the open method’s block form.



208
209
210
211
212
213
214
215
216
217
218
219
220
# File 'lib/xz/stream.rb', line 208

def finish
  return if @finished

  # Clean up the lzma_stream structure's internal memory.
  # This would belong into a destructor if Ruby had that.
  XZ::LibLZMA.lzma_end(@lzma_stream)
  Fiddle.free @lzma_stream.to_ptr
  Fiddle.free @input_buffer_p
  Fiddle.free @output_buffer_p
  @finished = true

  @delegate_io
end

#finished?Boolean

True if liblzma’s internal memory has been freed. For writer instances, receiving true from this method also means that all of liblzma’s compressed data has been flushed to the underlying IO object.

Returns:

  • (Boolean)


195
196
197
# File 'lib/xz/stream.rb', line 195

def finished?
  @finished
end

#getbyteObject

Like IO#getbyte. Note this method isn’t exactly performant, because it actually reads compressed data as a string and then needs to figure out the bytes from that again.



301
302
303
304
# File 'lib/xz/stream.rb', line 301

def getbyte
  return nil if eof?
  read(1).bytes.first
end

#getcObject

Like IO#getc.



312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
# File 'lib/xz/stream.rb', line 312

def getc
  str = String.new

  # Read byte-by-byte until a valid character in the external
  # encoding was built.
  loop do
    str.force_encoding(Encoding::BINARY)
    str << read(1)
    str.force_encoding(@external_encoding)

    break if str.valid_encoding? || eof?
  end

  # Transcode to internal encoding if one was requested
  if @internal_encoding
    str.encode(@internal_encoding)
  else
    str
  end
end

#gets(separator = $/, limit = nil) ⇒ Object

Like IO#gets.



339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
# File 'lib/xz/stream.rb', line 339

def gets(separator = $/, limit = nil)
  return nil if eof?
  @lineno += 1

  # Mirror IO#gets' weird call-seq
  if separator.respond_to?(:to_int)
    limit = separator.to_int
    separator = $/
  end

  buf = String.new
  buf.force_encoding(target_encoding)
  until eof? || (limit && buf.length >= limit)
    buf << getc
    return buf if buf[-1] == separator
  end

  buf
end

#lzma_code(str, action) ⇒ Object

Pass the given str into libzlma’s lzma_code() function. action is either LibLZMA::LZMA_RUN (still working) or LibLZMA::LZMA_FINISH (this is the last piece).



113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
# File 'lib/xz/stream.rb', line 113

def lzma_code(str, action) # :nodoc:
  previous_encoding = str.encoding
  str.force_encoding(Encoding::BINARY) # Need to operate on bytes now

  begin
    pos = 0
    until pos > str.bytesize # Do not use >=, that conflicts with #lzma_finish
      substr = str[pos, XZ::CHUNK_SIZE]
      @input_buffer_p[0, substr.bytesize] = substr
      pos += XZ::CHUNK_SIZE

      @lzma_stream.next_in  = @input_buffer_p
      @lzma_stream.avail_in = substr.bytesize

      loop do
        @lzma_stream.next_out  = @output_buffer_p
        @lzma_stream.avail_out = XZ::CHUNK_SIZE
        res = XZ::LibLZMA.lzma_code(@lzma_stream.to_ptr, action)
        XZ.send :check_lzma_code_retval, res # call package-private method

        data = @output_buffer_p[0, XZ::CHUNK_SIZE - @lzma_stream.avail_out]
        yield(data)

        break unless @lzma_stream.avail_out == 0
      end
    end
  ensure
    str.force_encoding(previous_encoding)
  end
end

#posObject Also known as: tell

Returns the position in the decompressed data (regardless of whether this is a reader or a writer instance).



257
258
259
# File 'lib/xz/stream.rb', line 257

def pos
  @pos
end

Like IO#print.



439
440
441
442
443
444
445
446
447
448
449
450
451
# File 'lib/xz/stream.rb', line 439

def print(*objs)
  if objs.empty?
    write($_)
  else
    objs.each do |obj|
      write(obj.to_s)
      write($,) if $,
    end
  end

  write($\) if $\
  nil
end

#printf(*args) ⇒ Object

Like IO#printf.



400
401
402
403
# File 'lib/xz/stream.rb', line 400

def printf(*args)
  write(sprintf(*args))
  nil
end

#putc(obj) ⇒ Object

Like IO#putc.



406
407
408
409
410
411
412
413
414
# File 'lib/xz/stream.rb', line 406

def putc(obj)
  if obj.respond_to? :chr
    write(obj.chr)
  elsif obj.respond_to? :to_str
    write(obj.to_str)
  else
    raise(TypeError, "Can only #putc strings and numbers")
  end
end

#puts(*objs) ⇒ Object



416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
# File 'lib/xz/stream.rb', line 416

def puts(*objs)
  if objs.empty?
    write("\n")
    return nil
  end

  objs.each do |obj|
    if obj.respond_to? :to_ary
      puts(*obj.to_ary)
    else
      # Don't squeeze multiple subsequent trailing newlines in `obj'
      obj = obj.to_s
      if obj.end_with?("\n".encode(obj.encoding))
        write(obj)
      else
        write(obj + "\n".encode(obj.encoding))
      end
    end
  end
  nil
end

#read(*args) ⇒ Object

Overridden in StreamReader to be like IO#read. This abstract implementation only raises IOError.

Raises:

  • (IOError)


245
246
247
# File 'lib/xz/stream.rb', line 245

def read(*args)
  raise(IOError, "Stream not opened for reading")
end

#readbyteObject

Like IO#readbyte.



307
308
309
# File 'lib/xz/stream.rb', line 307

def readbyte
  getbyte || raise(EOFError, "End of stream reached")
end

#readcharObject

Like IO#readchar.



334
335
336
# File 'lib/xz/stream.rb', line 334

def readchar
  getc || raise(EOFError, "End of stream reached")
end

#readline(*args) ⇒ Object

Like IO#readline.



360
361
362
# File 'lib/xz/stream.rb', line 360

def readline(*args)
  gets(*args) || raise(EOFError, "End of stream reached")
end

#reopen(*args) ⇒ Object

It is not possible to reopen an lzma stream, hence this method always raises NotImplementedError.

Raises:

  • (NotImplementedError)


455
456
457
# File 'lib/xz/stream.rb', line 455

def reopen(*args)
  raise(NotImplementedError, "Can't reopen an lzma stream")
end

#rewindObject

Partial implementation of rewind abstracting common operations. The subclasses implement the rest.



146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
# File 'lib/xz/stream.rb', line 146

def rewind # :nodoc:
  # Free the current lzma stream and rewind the underlying IO.
  # It is required to call #rewind before allocating a new lzma
  # stream, because if #rewind raises an exception (because the
  # underlying IO is not rewindable), a memory leak would occur
  # with regard to an allocated-but-never-freed lzma stream.
  finish
  @delegate_io.rewind

  # Reset internal state
  @pos = @lineno = 0
  @finished = false
  @lzma_stream = XZ::LibLZMA::LZMAStream.malloc
  @input_buffer_p  = Fiddle::Pointer.malloc(XZ::CHUNK_SIZE)
  @output_buffer_p = Fiddle::Pointer.malloc(XZ::CHUNK_SIZE)
  XZ::LibLZMA::LZMA_STREAM_INIT(@lzma_stream)

  0 # Mimic IO#rewind's return value
end

#set_encoding(*args) ⇒ Object

Like IO#set_encoding.



263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
# File 'lib/xz/stream.rb', line 263

def set_encoding(*args)
  if args.count < 1 || args.count > 3
    raise ArgumentError, "Wrong number of arguments: Expected 1-3, got #{args.count}"
  end

  # Clean `args' to [external_encoding, internal_encoding],
  # and @transcode_options.
  return set_encoding($`, $', *args[1..-1]) if args[0].respond_to?(:to_str) && args[0].to_str =~ /:/
  @transcode_options = args.delete_at(-1) if args[-1].kind_of?(Hash)

  # `args' is always [external, internal] or [external] at this point
  @external_encoding = args[0].kind_of?(Encoding) ? args[0] : Encoding.find(args[0])
  if args[1]
    @internal_encoding = args[1].kind_of?(Encoding) ? args[1] : Encoding.find(args[1])
  else
    @internal_encoding = Encoding.default_internal # Encoding.default_internal defaults to nil
  end

  self
end

#to_ioObject

You can mostly treat this as if it were an IO object. At least for subclasses. This class itself is abstract, you shouldn’t be using it directly at all.

Returns the receiver.



171
172
173
# File 'lib/xz/stream.rb', line 171

def to_io
  self
end

#write(*args) ⇒ Object

Overridden in StreamWriter to be like IO#write. This abstract implementation only raises IOError.

Raises:

  • (IOError)


251
252
253
# File 'lib/xz/stream.rb', line 251

def write(*args)
  raise(IOError, "Stream not opened for writing")
end