Class: HexaPDF::Revisions

Inherits:

Object

Object
HexaPDF::Revisions

Includes:: Enumerable

Defined in:: lib/hexapdf/revisions.rb

Overview

Manages the revisions of a PDF document.

A PDF document has one revision when it is created. Later, new revisions are added when changes are made. This allows for adding information/content to a PDF file without changing the original content.

The order of the revisions is important. In HexaPDF the oldest revision always has index 0 and the newest revision the highest index. This is also the order in which the revisions get written.

Important: It is possible to manipulate the individual revisions and their objects oneself but this should only be done if one is familiar with the inner workings of HexaPDF. Otherwise it is best to use the convenience methods of this class to create, access or delete indirect objects.

See: PDF2.0 s7.5.6, HexaPDF::Revision

Instance Attribute Summary collapse

#parser ⇒ Object readonly

The Parser instance used for reading the initial revisions.

Class Method Summary collapse

.from_io(document, io) ⇒ Object

Loads all revisions for the document from the given IO and returns the created Revisions object.

Instance Method Summary collapse

#add ⇒ Object

Adds a new empty revision to the document and returns it.
#add_object(obj) ⇒ Object

:call-seq: revisions.add_object(object) -> object.
#all ⇒ Object

Returns a list of all revisions.
#current ⇒ Object

Returns the current revision.
#delete_object(ref) ⇒ Object

:call-seq: revisions.delete_object(ref) revisions.delete_object(oid).
#each(&block) ⇒ Object

:call-seq: revisions.each {|rev| block } -> revisions revisions.each -> Enumerator.
#each_object(only_current: true, only_loaded: false, &block) ⇒ Object

:call-seq: revisions.each_object(only_current: true, only_loaded: false) {|obj| block } -> revisions revisions.each_object(only_current: true, only_loaded: false) {|obj, rev| block } -> revisions revisions.each_object(only_current: true, only_loaded: false) -> Enumerator.
#initialize(document, initial_revisions: nil, parser: nil) ⇒ Revisions constructor

Creates a new revisions object for the given PDF document.
#merge(range = 0..-1)) ⇒ Object

:call-seq: revisions.merge(range = 0..-1) -> revisions.
#next_oid ⇒ Object

Returns the next object identifier that should be used when adding a new object.
#object(ref) ⇒ Object

:call-seq: revisions.object(ref) -> obj or nil revisions.object(oid) -> obj or nil.
#object?(ref) ⇒ Boolean

:call-seq: revisions.object?(ref) -> true or false revisions.object?(oid) -> true or false.

Constructor Details

#initialize(document, initial_revisions: nil, parser: nil) ⇒ `Revisions`

Creates a new revisions object for the given PDF document.

Options:

initial_revisions: An array of revisions that should initially be used. If this option is not specified, a single empty revision is added.
parser: The parser with which the initial revisions were read. If this option is not specified even though the document was read from an IO stream, some parts may not work, like incremental writing.

# File 'lib/hexapdf/revisions.rb', line 143

def initialize(document, initial_revisions: nil, parser: nil)
  @document = document
  @parser = parser

  @revisions = []
  if initial_revisions
    @revisions += initial_revisions
  else
    add
  end
end

Instance Attribute Details

#parser ⇒ `Object` (readonly)

The Parser instance used for reading the initial revisions.



129
130
131

# File 'lib/hexapdf/revisions.rb', line 129

def parser
  @parser
end

Class Method Details

.from_io(document, io) ⇒ `Object`

Loads all revisions for the document from the given IO and returns the created Revisions object.

If the io object is nil, an empty Revisions object is returned.

# File 'lib/hexapdf/revisions.rb', line 67

def from_io(document, io)
  return new(document) if io.nil?

  parser = Parser.new(io, document)
  object_loader = lambda {|xref_entry| parser.load_object(xref_entry) }

  revisions = []
  begin
    offset = parser.startxref_offset
    seen_xref_offsets = {}

    while offset && !seen_xref_offsets.key?(offset)
      # PDF2.0 s7.5.5 states that :Prev needs to be indirect, Adobe's reference 3.4.4 says it
      # should be direct. Adobe's POV is followed here. Same with :XRefStm.
      xref_section, trailer = parser.load_revision(offset)
      seen_xref_offsets[offset] = true

      stm = trailer[:XRefStm]
      if stm && !seen_xref_offsets.key?(stm)
        if xref_section.max_oid == 0 && trailer[:Prev] > stm
          # Revision is completely empty, with xref stream in previous revision
          merge_revision = trailer[:Prev]
        end
        stm_xref_section, = parser.load_revision(stm)
        stm_xref_section.merge!(xref_section)
        xref_section = stm_xref_section
        seen_xref_offsets[stm] = true
      end

      if parser.linearized? && !trailer.key?(:Prev)
        merge_revision = offset
      end

      if merge_revision == offset
        xref_section.merge!(revisions.first.xref_section)
        offset = trailer[:Prev] # Get possible next offset before overwriting trailer
        trailer = revisions.first.trailer
        revisions.shift
      else
        offset = trailer[:Prev]
      end

      revisions.unshift(Revision.new(document.wrap(trailer, type: :XXTrailer),
                                     xref_section: xref_section, loader: object_loader))
    end
  rescue HexaPDF::MalformedPDFError
    raise unless (reconstructed_revision = parser.reconstructed_revision)
    unless revisions.empty?
      reconstructed_revision.trailer.data.value = revisions.last.trailer.data.value
    end
    revisions << reconstructed_revision
  end

  document.version = parser.file_header_version rescue '1.0'
  new(document, initial_revisions: revisions, parser: parser)
end

Instance Method Details

#add ⇒ `Object`

Adds a new empty revision to the document and returns it.

Note: This method should only be used if one is familiar with the inner workings of HexaPDF *and the PDF specification.

# File 'lib/hexapdf/revisions.rb', line 306

def add
  if @revisions.empty?
    trailer = {}
  else
    trailer = current.trailer.value.dup
    trailer.delete(:Prev)
    trailer.delete(:XRefStm)
  end

  rev = Revision.new(@document.wrap(trailer, type: :XXTrailer))
  @revisions.push(rev)
  rev
end

#add_object(obj) ⇒ `Object`

:call-seq:

revisions.add_object(object)     -> object

Adds the given HexaPDF::Object to the current revision and returns it.

If object is a direct object, an object number is automatically assigned.

# File 'lib/hexapdf/revisions.rb', line 201

def add_object(obj)
  if obj.indirect? && (rev_obj = current.object(obj.oid))
    if rev_obj.data == obj.data
      return obj
    else
      raise HexaPDF::Error, "Can't add object because there is already " \
        "an object with object number #{obj.oid}"
    end
  end

  obj.oid = next_oid unless obj.indirect?
  current.add(obj)
end

#all ⇒ `Object`

Returns a list of all revisions.

Note: This method should only be used if one is familiar with the inner workings of HexaPDF *and the PDF specification.



284
285
286

# File 'lib/hexapdf/revisions.rb', line 284

def all
  @revisions
end

#current ⇒ `Object`

Returns the current revision.

Note: This method should only be used if one is familiar with the inner workings of HexaPDF *and the PDF specification.



276
277
278

# File 'lib/hexapdf/revisions.rb', line 276

def current
  @revisions.last
end

#delete_object(ref) ⇒ `Object`

:call-seq:

revisions.delete_object(ref)
revisions.delete_object(oid)

Deletes the indirect object specified by an exact reference or by an object number.

# File 'lib/hexapdf/revisions.rb', line 220

def delete_object(ref)
  @revisions.reverse_each do |rev|
    if rev.object?(ref)
      rev.delete(ref)
      break
    end
  end
end

#each(&block) ⇒ `Object`

:call-seq:

revisions.each {|rev| block }   -> revisions
revisions.each                  -> Enumerator

Iterates over all revisions from oldest to current one.

Note: This method should only be used if one is familiar with the inner workings of HexaPDF *and the PDF specification.

# File 'lib/hexapdf/revisions.rb', line 296

def each(&block)
  return to_enum(__method__) unless block_given?
  @revisions.each(&block)
  self
end

#each_object(only_current: true, only_loaded: false, &block) ⇒ `Object`

:call-seq:

revisions.each_object(only_current: true, only_loaded: false) {|obj| block }      -> revisions
revisions.each_object(only_current: true, only_loaded: false) {|obj, rev| block } -> revisions
revisions.each_object(only_current: true, only_loaded: false)                     -> Enumerator

Yields every object and optionally the revision it is in.

If only_loaded is true, only the already loaded objects of the PDF document are yielded. This does only matter when the document instance was created from an existing PDF document.

By default, only the current version of each object is returned which implies that each object number is yielded exactly once. If the only_current option is false, all stored objects from newest to oldest are returned, not only the current version of each object.

The only_current option can make a difference because the document can contain multiple revisions:

Multiple revisions may contain objects with the same object and generation numbers, e.g. two (different) objects with oid/gen [3,0].
Additionally, there may also be objects with the same object number but different generation numbers in different revisions, e.g. one object with oid/gen [3,0] and one with oid/gen [3,1].

Note that setting only_current to false is normally not necessary and should not be done. If it is still done, one has to take care to avoid an invalid document state.

# File 'lib/hexapdf/revisions.rb', line 255

def each_object(only_current: true, only_loaded: false, &block)
  unless block_given?
    return to_enum(__method__, only_current: only_current, only_loaded: only_loaded)
  end

  yield_rev = (block.arity == 2)
  oids = {}
  @revisions.reverse_each do |rev|
    rev.each(only_loaded: only_loaded) do |obj|
      next if only_current && oids.include?(obj.oid)
      yield_rev ? yield(obj, rev) : yield(obj)
      oids[obj.oid] = true
    end
  end
  self
end

#merge(range = 0..-1)) ⇒ `Object`

:call-seq:

revisions.merge(range = 0..-1)    -> revisions

Merges the revisions specified by the given range into one. Objects from newer revisions overwrite those from older ones.

# File 'lib/hexapdf/revisions.rb', line 325

def merge(range = 0..-1)
  @revisions[range].reverse.each_cons(2) do |rev, prev_rev|
    prev_rev.trailer.value.replace(rev.trailer.value)
    rev.each do |obj|
      if obj.data != prev_rev.object(obj)&.data
        prev_rev.delete(obj.oid, mark_as_free: false)
        prev_rev.add(obj)
      end
    end
  end
  _first, *other = *@revisions[range]
  other.each {|rev| @revisions.delete(rev) }
  self
end

#next_oid ⇒ `Object`

Returns the next object identifier that should be used when adding a new object.



156
157
158

# File 'lib/hexapdf/revisions.rb', line 156

def next_oid
  @revisions.map(&:next_free_oid).max
end

#object(ref) ⇒ `Object`

:call-seq:

revisions.object(ref)    -> obj or nil
revisions.object(oid)    -> obj or nil

Returns the current version of the indirect object for the given exact reference or for the given object number.

For references to unknown objects, nil is returned but free objects are represented by a PDF Null object, not by nil!

See: PDF2.0 s7.3.9

# File 'lib/hexapdf/revisions.rb', line 171

def object(ref)
  i = @revisions.size - 1
  while i >= 0
    if (result = @revisions[i].object(ref))
      return result
    end
    i -= 1
  end
  nil
end

#object?(ref) ⇒ `Boolean`

:call-seq:

revisions.object?(ref)    -> true or false
revisions.object?(oid)    -> true or false

Returns true if one of the revisions contains an indirect object for the given exact reference or for the given object number.

Even though this method might return true for some references, #object may return nil because this method takes all revisions into account.

Returns:

(Boolean)



191
192
193

# File 'lib/hexapdf/revisions.rb', line 191

def object?(ref)
  @revisions.any? {|rev| rev.object?(ref) }
end

Class: HexaPDF::Revisions

Overview

Instance Attribute Summary collapse

Class Method Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(document, initial_revisions: nil, parser: nil) ⇒ Revisions

Instance Attribute Details

#parser ⇒ Object (readonly)

Class Method Details

.from_io(document, io) ⇒ Object

Instance Method Details

#add ⇒ Object

#add_object(obj) ⇒ Object

#all ⇒ Object

#current ⇒ Object

#delete_object(ref) ⇒ Object

#each(&block) ⇒ Object

#each_object(only_current: true, only_loaded: false, &block) ⇒ Object

#merge(range = 0..-1)) ⇒ Object

#next_oid ⇒ Object

#object(ref) ⇒ Object

#object?(ref) ⇒ Boolean

#initialize(document, initial_revisions: nil, parser: nil) ⇒ `Revisions`

#parser ⇒ `Object` (readonly)

.from_io(document, io) ⇒ `Object`

#add ⇒ `Object`

#add_object(obj) ⇒ `Object`

#all ⇒ `Object`

#current ⇒ `Object`

#delete_object(ref) ⇒ `Object`

#each(&block) ⇒ `Object`

#each_object(only_current: true, only_loaded: false, &block) ⇒ `Object`

#merge(range = 0..-1)) ⇒ `Object`

#next_oid ⇒ `Object`

#object(ref) ⇒ `Object`

#object?(ref) ⇒ `Boolean`