Class: HexaPDF::Object

Inherits:
Object
  • Object
show all
Includes:
Comparable
Defined in:
lib/hexapdf/object.rb

Overview

Objects of the PDF object system.

Overview

A PDF object is like a normal object but with an additional *object identifier* consisting of an object number and a generation number. If the object number is zero, then the PDF object represents a direct object. Otherwise the object identifier uniquely identifies this object as an indirect object and can be used for referencing it (from possibly multiple places).

Furthermore a PDF object may have an associated stream. However, this stream is only accessible if the subclass Stream is used.

A PDF object should be connected to a PDF document, otherwise some methods may not work.

Most PDF objects in a PDF document are represented by subclasses of this class that provide additional functionality.

The methods #hash and #eql? are implemented so that objects of this class can be used as hash keys. Furthermore the implementation is compatible to the one of Reference, i.e. the hash of a PDF Object is the same as the hash of its corresponding Reference object.

Allowed PDF Object Values

The PDF specification knows of the following object types:

  • Boolean (mapped to true and false),

  • Integer (mapped to Integer object)

  • Real (mapped to Float objects)

  • String (mapped to String objects with UTF-8 or binary encoding)

  • Names (mapped to Symbol objects)

  • Array (mapped to Array objects)

  • Dictionary (mapped to Hash objects)

  • Stream (mapped to the Stream class which is a Dictionary with the associated stream data)

  • Null (mapped to nil)

  • Indirect Object (mapped to this class)

So working with PDF objects in HexaPDF is rather straightforward since the common Ruby objects can be used for most things, i.e. wrapping an plain Ruby object into an object of this class is not necessary (except if it should become an indirect object).

There are also some additional data structures built from these primitive ones. For example, Time objects are represented as specially formatted string objects and conversion from and to the string representation is handled automatically.

Important: Users of HexaPDF may use other plain Ruby objects but then there is no guarantee that everything will work correctly, especially when using other collection types than arrays and hashes.

See: HexaPDF::Dictionary, HexaPDF::Stream, HexaPDF::Reference, HexaPDF::Document

See: PDF2.0 s7.3.10, s7.3.8

Direct Known Subclasses

Dictionary, PDFArray

Instance Attribute Summary collapse

Class Method Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(value, document: nil, oid: nil, gen: nil, stream: nil) ⇒ Object

Creates a new PDF object wrapping the value.

The value can either be a PDFData object in which case it is used directly. If it is a PDF Object, then its data is used. Otherwise the value object is used as is. In all cases, the oid, gen and stream values may be overridden by the corresponding keyword arguments.



192
193
194
195
196
197
198
199
200
201
202
203
204
# File 'lib/hexapdf/object.rb', line 192

def initialize(value, document: nil, oid: nil, gen: nil, stream: nil)
  @data = case value
          when PDFData then value
          when Object then value.data
          else PDFData.new(value)
          end
  @data.oid = oid if oid
  @data.gen = gen if gen
  @data.stream = stream if stream
  self.document = document
  self.must_be_indirect = false
  after_data_change
end

Instance Attribute Details

#dataObject (readonly)

The wrapped HexaPDF::PDFData value.

This attribute is not part of the public API!



179
180
181
# File 'lib/hexapdf/object.rb', line 179

def data
  @data
end

#documentObject

Returns the associated PDF document.

If no document is associated, an error is raised.



240
241
242
# File 'lib/hexapdf/object.rb', line 240

def document
  @document || raise(HexaPDF::Error, "No document associated with this object (#{inspect})")
end

#must_be_indirect=(value) ⇒ Object (writeonly)

Sets whether the object has to be an indirect object once it is written.



185
186
187
# File 'lib/hexapdf/object.rb', line 185

def must_be_indirect=(value)
  @must_be_indirect = value
end

Class Method Details

.deep_copy(object) ⇒ Object

:call-seq:

HexaPDF::Object.deep_copy(object)    -> copy

Creates a deep copy of the given object which retains the references to indirect objects.



129
130
131
132
133
134
135
136
137
138
139
140
141
142
# File 'lib/hexapdf/object.rb', line 129

def self.deep_copy(object)
  case object
  when Hash
    object.transform_values {|value| deep_copy(value) }
  when Array
    object.map {|o| deep_copy(o) }
  when HexaPDF::Object
    (object.indirect? || object.must_be_indirect? ? object : deep_copy(object.value))
  when HexaPDF::Reference
    object
  else
    object.dup
  end
end

.field(_name) ⇒ Object

Returns nil to end the recursion for field searching in Dictionary.field.



172
173
174
# File 'lib/hexapdf/object.rb', line 172

def self.field(_name)
  nil
end

.make_direct(object, document) ⇒ Object

Makes sure that the object itself as well as all nested values are direct objects.

The document argument needs to contain the Document instance to which object belongs so that references can be correctly resolved.

If an indirect object is found, it is turned into a direct object and the indirect object is deleted from the document.



151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
# File 'lib/hexapdf/object.rb', line 151

def self.make_direct(object, document)
  if object.kind_of?(HexaPDF::Object) && object.indirect?
    raise HexaPDF::Error, "Can't make a stream object a direct object" if object.data.stream
    object_to_delete = object
    object = object.value
    object_to_delete.document.delete(object_to_delete)
  end
  case object
  when HexaPDF::Object
    object.data.value = make_direct(object.data.value, document)
  when Hash
    object.transform_values! {|val| make_direct(val, document) }
  when Array
    object.map! {|val| make_direct(val, document) }
  when Reference
    object = make_direct(document.object(object), document)
  end
  object
end

Instance Method Details

#<=>(other) ⇒ Object

Compares this object to another object.

If the other object does not respond to oid or gen, nil is returned. Otherwise objects are ordered first by object number and then by generation number.



352
353
354
355
# File 'lib/hexapdf/object.rb', line 352

def <=>(other)
  return nil unless other.respond_to?(:oid) && other.respond_to?(:gen)
  (oid == other.oid ? gen <=> other.gen : oid <=> other.oid)
end

#==(other) ⇒ Object

Returns true in the following cases:

  • The other object is an Object and wraps the same #data structure.

  • The other object is a Reference with the same oid/gen.

  • This object is not indirect and the other object is not an Object and equal to the value of this object.



363
364
365
366
# File 'lib/hexapdf/object.rb', line 363

def ==(other)
  (other.kind_of?(Object) && data == other.data) || (other.kind_of?(Reference) && other == self) ||
    (!indirect? && !other.kind_of?(Object) && other == data.value)
end

#cache(key, value = Document::UNSET, update: false, &block) ⇒ Object

Caches and returns the given value or the value of the block under the given cache key. If there is already a cached value for the key and update is false, it is just returned.

Set update to true to force an update of the cached value.

This uses Document#cache internally.



332
333
334
# File 'lib/hexapdf/object.rb', line 332

def cache(key, value = Document::UNSET, update: false, &block)
  document.cache(@data, key, value, update: update, &block)
end

#cached?(key) ⇒ Boolean

Returns true if there is a cached value for the given key.

This uses Document#cached? internally.

Returns:

  • (Boolean)


339
340
341
# File 'lib/hexapdf/object.rb', line 339

def cached?(key)
  document.cached?(@data, key)
end

#clear_cacheObject

Clears the cache for this object.



344
345
346
# File 'lib/hexapdf/object.rb', line 344

def clear_cache
  document.clear_cache(@data)
end

#deep_copyObject

Makes a deep copy of the source PDF object and resets the object identifier.

Note that indirect references are not copied! If that is also needed, use Importer::copy.



316
317
318
319
320
321
322
323
324
# File 'lib/hexapdf/object.rb', line 316

def deep_copy
  obj = dup
  obj.instance_variable_set(:@data, @data.dup)
  obj.data.oid = 0
  obj.data.gen = 0
  obj.data.stream = @data.stream.dup if @data.stream.kind_of?(String)
  obj.data.value = self.class.deep_copy(@data.value)
  obj
end

#document?Boolean

Returns true if a PDF document is associated.

Returns:

  • (Boolean)


245
246
247
# File 'lib/hexapdf/object.rb', line 245

def document?
  !@document.nil?
end

#eql?(other) ⇒ Boolean

Returns true if the other object references the same PDF object as this object.

Returns:

  • (Boolean)


369
370
371
# File 'lib/hexapdf/object.rb', line 369

def eql?(other)
  other.respond_to?(:oid) && oid == other.oid && other.respond_to?(:gen) && gen == other.gen
end

#genObject

Returns the generation number of the PDF object.



217
218
219
# File 'lib/hexapdf/object.rb', line 217

def gen
  data.gen
end

#gen=(gen) ⇒ Object

Sets the generation number of the PDF object.



222
223
224
# File 'lib/hexapdf/object.rb', line 222

def gen=(gen)
  data.gen = gen
end

#hashObject

Computes the hash value based on the object and generation numbers.



374
375
376
# File 'lib/hexapdf/object.rb', line 374

def hash
  [oid, gen].hash
end

#indirect?Boolean

Returns true if the object is an indirect object (i.e. has an object number unequal to zero).

Returns:

  • (Boolean)


251
252
253
# File 'lib/hexapdf/object.rb', line 251

def indirect?
  oid != 0
end

#inspectObject

:nodoc:



378
379
380
# File 'lib/hexapdf/object.rb', line 378

def inspect #:nodoc:
  "#<#{self.class.name} [#{oid}, #{gen}] value=#{value.inspect}>"
end

#must_be_indirect?Boolean

Returns true if the object must be an indirect object once it is written.

Returns:

  • (Boolean)


256
257
258
# File 'lib/hexapdf/object.rb', line 256

def must_be_indirect?
  @must_be_indirect
end

#null?Boolean

Returns true if the object represents the PDF null object.

Returns:

  • (Boolean)


276
277
278
# File 'lib/hexapdf/object.rb', line 276

def null?
  value.nil?
end

#oidObject

Returns the object number of the PDF object.



207
208
209
# File 'lib/hexapdf/object.rb', line 207

def oid
  data.oid
end

#oid=(oid) ⇒ Object

Sets the object number of the PDF object.



212
213
214
# File 'lib/hexapdf/object.rb', line 212

def oid=(oid)
  data.oid = oid
end

#typeObject

Returns the type (symbol) of the object.

Since the type system is implemented in such a way as to allow exchanging implementations of specific types, the class of an object can’t be reliably used for determining the actual type.

However, the Type and Subtype fields can easily be used for this. Subclasses for PDF objects that don’t have such fields may use a unique name that has to begin with XX (see PDF2.0 sE.2) and therefore doesn’t clash with names defined by the PDF specification.

For basic objects this always returns :Unknown.



271
272
273
# File 'lib/hexapdf/object.rb', line 271

def type
  :Unknown
end

#validate(auto_correct: true) ⇒ Object

:call-seq:

obj.validate(auto_correct: true)                                    -> true or false
obj.validate(auto_correct: true) {|msg, correctable, obj| block }   -> true or false

Validates the object, optionally corrects problems when the option auto_correct is set and returns true if the object is deemed valid and false otherwise.

If a block is given, it is called on validation problems with a problem description and whether the problem is automatically correctable. The third argument to the block is usually this object but may be another object if during auto-correction a new object was created and validated.

The validation routine itself has to be implemented in the #perform_validation method - see its documentation for more information.

Note: Even if the return value is true there may be problems since HexaPDF doesn’t currently implement the full PDF spec. However, if the return value is false, there is certainly a problem!



298
299
300
301
302
303
304
305
306
307
308
309
310
311
# File 'lib/hexapdf/object.rb', line 298

def validate(auto_correct: true)
  result = true
  perform_validation do |msg, correctable, object|
    yield(msg, correctable, object || self) if block_given?
    result = false unless correctable
    return false unless auto_correct
  end
  result
rescue HexaPDF::Error
  raise
rescue StandardError
  yield("Error: Unexpected value encountered", false, self) if block_given?
  false
end

#valueObject

Returns the object value.



227
228
229
# File 'lib/hexapdf/object.rb', line 227

def value
  data.value
end

#value=(val) ⇒ Object

Sets the object value. Unlike in #initialize the value is used as is!



232
233
234
235
# File 'lib/hexapdf/object.rb', line 232

def value=(val)
  data.value = val
  after_data_change
end