Class: PDF::Reader::ObjectHash

Inherits:
Object
  • Object
show all
Includes:
Enumerable
Defined in:
lib/pdf/reader/object_hash.rb

Overview

Provides low level access to the objects in a PDF file via a hash-like object.

A PDF file can be viewed as a large hash map. It is a series of objects stored at precise byte offsets, and a table that maps object IDs to byte offsets. Given an object ID, looking up an object is an O(1) operation.

Each PDF object can be mapped to a ruby object, so by passing an object ID to the [] method, a ruby representation of that object will be retrieved.

The class behaves much like a standard Ruby hash, including the use of the Enumerable mixin. The key difference is no []= method - the hash is read only.

Basic Usage

h = PDF::Reader::ObjectHash.new("somefile.pdf")
h[1]
=> 3469

h[PDF::Reader::Reference.new(1,0)]
=> 3469

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(input, opts = {}) ⇒ ObjectHash

Creates a new ObjectHash object. Input can be a string with a valid filename or an IO-like object.

Valid options:

:password - the user password to decrypt the source PDF

: ((IO | Tempfile | StringIO | String), ?Hash[Symbol, untyped]) -> void



63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
# File 'lib/pdf/reader/object_hash.rb', line 63

def initialize(input, opts = {})
  @io          = extract_io_from(input) #: IO | Tempfile | StringIO
  @xref        = PDF::Reader::XRef.new(@io) #: PDF::Reader::XRef[PDF::Reader::Reference]
  @pdf_version = read_version #: Float
  @trailer     = @xref.trailer #: Hash[Symbol, untyped]
  @cache       = opts[:cache] || PDF::Reader::ObjectCache.new #: PDF::Reader::ObjectCache
  @sec_handler = NullSecurityHandler.new #: securityHandler
  @sec_handler = SecurityHandlerFactory.build(
    deref(trailer[:Encrypt]),
    deref(trailer[:ID]),
    opts[:password]
  )
  @page_references = nil #: Array[PDF::Reader::Reference | Hash[Symbol, untyped]]?
  @object_streams = nil #: Hash[PDF::Reader::Reference, PDF::Reader::ObjectStream]?
end

Instance Attribute Details

#defaultObject

: untyped



44
45
46
# File 'lib/pdf/reader/object_hash.rb', line 44

def default
  @default
end

#pdf_versionObject (readonly)

: Float



50
51
52
# File 'lib/pdf/reader/object_hash.rb', line 50

def pdf_version
  @pdf_version
end

#sec_handlerObject (readonly)

: securityHandler



53
54
55
# File 'lib/pdf/reader/object_hash.rb', line 53

def sec_handler
  @sec_handler
end

#trailerObject (readonly)

: Hash[Symbol, untyped]



47
48
49
# File 'lib/pdf/reader/object_hash.rb', line 47

def trailer
  @trailer
end

Instance Method Details

#[](key) ⇒ Object

Access an object from the PDF. key can be an int or a PDF::Reader::Reference object.

If an int is used, the object with that ID and a generation number of 0 will be returned.

If a PDF::Reader::Reference object is used the exact ID and generation number can be specified.

: ((Integer | PDF::Reader::Reference)) -> untyped



103
104
105
106
107
108
109
110
111
112
113
# File 'lib/pdf/reader/object_hash.rb', line 103

def [](key)
  return default if key.to_i <= 0

  unless key.is_a?(PDF::Reader::Reference)
    key = PDF::Reader::Reference.new(key.to_i, 0)
  end

  @cache[key] ||= fetch_object(key) || fetch_object_stream(key)
rescue InvalidObjectError
  return default
end

#deref!(key) ⇒ Object

Recursively dereferences the object refered to be key. If key is not a PDF::Reader::Reference, the key is returned unchanged.

: (untyped) -> untyped



350
351
352
# File 'lib/pdf/reader/object_hash.rb', line 350

def deref!(key)
  deref_internal!(key, {})
end

#deref_array(key) ⇒ Object

If key is a PDF::Reader::Reference object, lookup the corresponding object in the PDF and return it. Otherwise return key untouched.

Guaranteed to only return an Array or nil. If the dereference results in any other type then a MalformedPDFError exception will raise. Useful when expecting an Array and no other type will do. : (untyped) -> Array?



131
132
133
134
135
136
137
138
139
# File 'lib/pdf/reader/object_hash.rb', line 131

def deref_array(key)
  obj = deref(key)

  return obj if obj.nil?

  obj.tap { |obj|
    raise MalformedPDFError, "expected object to be an Array or nil" if !obj.is_a?(Array)
  }
end

#deref_array!(key) ⇒ Object

: (untyped) -> Array?



355
356
357
358
359
360
361
# File 'lib/pdf/reader/object_hash.rb', line 355

def deref_array!(key)
  deref!(key).tap { |obj|
    if !obj.nil? && !obj.is_a?(Array)
      raise MalformedPDFError, "expected object (#{obj.inspect}) to be an Array or nil"
    end
  }
end

#deref_array_of_numbers(key) ⇒ Object

If key is a PDF::Reader::Reference object, lookup the corresponding object in the PDF and return it. Otherwise return key untouched.

Guaranteed to only return an Array of Numerics or nil. If the dereference results in any other type then a MalformedPDFError exception will raise. Useful when expecting an Array and no other type will do.

Some effort to cast array elements to a number is made for any non-numeric elements. : (untyped) -> Array?

Raises:



150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
# File 'lib/pdf/reader/object_hash.rb', line 150

def deref_array_of_numbers(key)
  arr = deref(key)

  return arr if arr.nil?

  raise MalformedPDFError, "expected object to be an Array" unless arr.is_a?(Array)

  arr.map { |item|
    if item.is_a?(Numeric)
      item
    elsif item.respond_to?(:to_f)
      item.to_f
    elsif item.respond_to?(:to_i)
      item.to_i
    else
      raise MalformedPDFError, "expected object to be a number"
    end
  }
end

#deref_hash(key) ⇒ Object

If key is a PDF::Reader::Reference object, lookup the corresponding object in the PDF and return it. Otherwise return key untouched.

Guaranteed to only return a Hash or nil. If the dereference results in any other type then a MalformedPDFError exception will raise. Useful when expecting an Array and no other type will do. : (untyped) -> Hash[Symbol, untyped]?



177
178
179
180
181
182
183
184
185
# File 'lib/pdf/reader/object_hash.rb', line 177

def deref_hash(key)
  obj = deref(key)

  return obj if obj.nil?

  obj.tap { |obj|
    raise MalformedPDFError, "expected object to be a Hash or nil" if !obj.is_a?(Hash)
  }
end

#deref_hash!(key) ⇒ Object

: (untyped) -> Hash[Symbol, untyped]?



364
365
366
367
368
369
370
# File 'lib/pdf/reader/object_hash.rb', line 364

def deref_hash!(key)
  deref!(key).tap { |obj|
    if !obj.nil? && !obj.is_a?(Hash)
      raise MalformedPDFError, "expected object (#{obj.inspect}) to be a Hash or nil"
    end
  }
end

#deref_integer(key) ⇒ Object

If key is a PDF::Reader::Reference object, lookup the corresponding object in the PDF and return it. Otherwise return key untouched.

Guaranteed to only return an Integer or nil. If the dereference results in any other type then a MalformedPDFError exception will raise. Useful when expecting an Array and no other type will do.

Some effort to cast to an int is made when the reference points to a non-integer. : (untyped) -> Integer?



221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
# File 'lib/pdf/reader/object_hash.rb', line 221

def deref_integer(key)
  obj = deref(key)

  return obj if obj.nil?

  if !obj.is_a?(Integer)
    if obj.respond_to?(:to_i)
      obj = obj.to_i
    else
      raise MalformedPDFError, "expected object to be an Integer"
    end
  end

  obj
end

#deref_name(key) ⇒ Object

If key is a PDF::Reader::Reference object, lookup the corresponding object in the PDF and return it. Otherwise return key untouched.

Guaranteed to only return a PDF name (Symbol) or nil. If the dereference results in any other type then a MalformedPDFError exception will raise. Useful when expecting an Array and no other type will do.

Some effort to cast to a symbol is made when the reference points to a non-symbol. : (untyped) -> Symbol?



196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
# File 'lib/pdf/reader/object_hash.rb', line 196

def deref_name(key)
  obj = deref(key)

  return obj if obj.nil?

  if !obj.is_a?(Symbol)
    if obj.respond_to?(:to_sym)
      obj = obj.to_sym
    else
      raise MalformedPDFError, "expected object to be a Name"
    end
  end

  obj
end

#deref_name_or_array(key) ⇒ Object

If key is a PDF::Reader::Reference object, lookup the corresponding object in the PDF and return it. Otherwise return key untouched.

Guaranteed to only return a PDF Name (symbol), Array or nil. If the dereference results in any other type then a MalformedPDFError exception will raise. Useful when expecting a Name or Array and no other type will do. : (untyped) -> (Symbol | Array | nil)



315
316
317
318
319
320
321
322
323
324
325
# File 'lib/pdf/reader/object_hash.rb', line 315

def deref_name_or_array(key)
  obj = deref(key)

  return obj if obj.nil?

  obj.tap { |obj|
    if !obj.is_a?(Symbol) && !obj.is_a?(Array)
      raise MalformedPDFError, "expected object to be an Array or Name"
    end
  }
end

#deref_number(key) ⇒ Object

If key is a PDF::Reader::Reference object, lookup the corresponding object in the PDF and return it. Otherwise return key untouched.

Guaranteed to only return a Numeric or nil. If the dereference results in any other type then a MalformedPDFError exception will raise. Useful when expecting an Array and no other type will do.

Some effort to cast to a number is made when the reference points to a non-number. : (untyped) -> Numeric?



246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
# File 'lib/pdf/reader/object_hash.rb', line 246

def deref_number(key)
  obj = deref(key)

  return obj if obj.nil?

  if !obj.is_a?(Numeric)
    if obj.respond_to?(:to_f)
      obj = obj.to_f
    elsif obj.respond_to?(:to_i)
      obj.to_i
    else
      raise MalformedPDFError, "expected object to be a number"
    end
  end

  obj
end

#deref_stream(key) ⇒ Object

If key is a PDF::Reader::Reference object, lookup the corresponding object in the PDF and return it. Otherwise return key untouched.

Guaranteed to only return a PDF::Reader::Stream or nil. If the dereference results in any other type then a MalformedPDFError exception will raise. Useful when expecting a stream and no other type will do. : (untyped) -> PDF::Reader::Stream?



271
272
273
274
275
276
277
278
279
280
281
# File 'lib/pdf/reader/object_hash.rb', line 271

def deref_stream(key)
  obj = deref(key)

  return obj if obj.nil?

  obj.tap { |obj|
    if !obj.is_a?(PDF::Reader::Stream)
      raise MalformedPDFError, "expected object to be a Stream or nil"
    end
  }
end

#deref_stream_or_array(key) ⇒ Object

If key is a PDF::Reader::Reference object, lookup the corresponding object in the PDF and return it. Otherwise return key untouched.

Guaranteed to only return a PDF::Reader::Stream, Array or nil. If the dereference results in any other type then a MalformedPDFError exception will raise. Useful when expecting a stream or Array and no other type will do. : (untyped) -> (PDF::Reader::Stream | Array | nil)



334
335
336
337
338
339
340
341
342
343
344
# File 'lib/pdf/reader/object_hash.rb', line 334

def deref_stream_or_array(key)
  obj = deref(key)

  return obj if obj.nil?

  obj.tap { |obj|
    if !obj.is_a?(PDF::Reader::Stream) && !obj.is_a?(Array)
      raise MalformedPDFError, "expected object to be an Array or Stream"
    end
  }
end

#deref_string(key) ⇒ Object

If key is a PDF::Reader::Reference object, lookup the corresponding object in the PDF and return it. Otherwise return key untouched.

Guaranteed to only return a String or nil. If the dereference results in any other type then a MalformedPDFError exception will raise. Useful when expecting a string and no other type will do.

Some effort to cast to a string is made when the reference points to a non-string. : (untyped) -> String?



292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
# File 'lib/pdf/reader/object_hash.rb', line 292

def deref_string(key)
  obj = deref(key)

  return obj if obj.nil?

  if !obj.is_a?(String)
    if obj.respond_to?(:to_s)
      obj = obj.to_s
    else
      raise MalformedPDFError, "expected object to be a string"
    end
  end

  obj
end

#each(&block) ⇒ Object Also known as: each_pair

iterate over each key, value. Just like a ruby hash.

@override(allow_incompatible: true) : () { (PDF::Reader::Reference, untyped) -> untyped } -> untyped



400
401
402
403
404
# File 'lib/pdf/reader/object_hash.rb', line 400

def each(&block)
  @xref.each do |ref|
    yield ref, self[ref]
  end
end

#each_key(&block) ⇒ Object

iterate over each key. Just like a ruby hash.

: { (PDF::Reader::Reference) -> untyped } -> untyped



410
411
412
413
414
# File 'lib/pdf/reader/object_hash.rb', line 410

def each_key(&block)
  each do |id, obj|
    yield id
  end
end

#each_value(&block) ⇒ Object

iterate over each value. Just like a ruby hash.

: { (untyped) -> untyped } -> untyped



419
420
421
422
423
# File 'lib/pdf/reader/object_hash.rb', line 419

def each_value(&block)
  each do |id, obj|
    yield obj
  end
end

#empty?Boolean

return true if there are no objects in this file

: () -> bool

Returns:

  • (Boolean)


436
437
438
# File 'lib/pdf/reader/object_hash.rb', line 436

def empty?
  size == 0 ? true : false
end

#encrypted?Boolean

: () -> bool

Returns:

  • (Boolean)


528
529
530
# File 'lib/pdf/reader/object_hash.rb', line 528

def encrypted?
  trailer.has_key?(:Encrypt)
end

#fetch(key, local_default = nil) ⇒ Object

Access an object from the PDF. key can be an int or a PDF::Reader::Reference object.

If an int is used, the object with that ID and a generation number of 0 will be returned.

If a PDF::Reader::Reference object is used the exact ID and generation number can be specified.

local_default is the object that will be returned if the requested key doesn’t exist.

: (untyped, ?untyped) -> untyped



385
386
387
388
389
390
391
392
393
394
# File 'lib/pdf/reader/object_hash.rb', line 385

def fetch(key, local_default = nil)
  obj = self[key]
  if obj
    return obj
  elsif local_default
    return local_default
  else
    raise IndexError, "#{key} is invalid" if key.to_i <= 0
  end
end

#has_key?(check_key) ⇒ Boolean Also known as: include?, key?, member?, value?

return true if the specified key exists in the file. key can be an int or a PDF::Reader::Reference

: (untyped) -> bool

Returns:

  • (Boolean)


444
445
446
447
448
449
450
451
452
453
454
# File 'lib/pdf/reader/object_hash.rb', line 444

def has_key?(check_key)
  # TODO update from O(n) to O(1)
  each_key do |key|
    if check_key.kind_of?(PDF::Reader::Reference)
      return true if check_key == key
    else
      return true if check_key.to_i == key.id
    end
  end
  return false
end

#has_value?(value) ⇒ Boolean

return true if the specifiedvalue exists in the file

: (untyped) -> bool

Returns:

  • (Boolean)


462
463
464
465
466
467
468
# File 'lib/pdf/reader/object_hash.rb', line 462

def has_value?(value)
  # TODO update from O(n) to O(1)
  each_value do |obj|
    return true if obj == value
  end
  return false
end

#keysObject

return an array of all keys in the file

: () -> Array



479
480
481
482
483
# File 'lib/pdf/reader/object_hash.rb', line 479

def keys
  ret = []
  each_key { |k| ret << k }
  ret
end

#obj_type(ref) ⇒ Object

returns the type of object a ref points to : ((Integer | PDF::Reader::Reference)) -> Symbol?



81
82
83
84
85
# File 'lib/pdf/reader/object_hash.rb', line 81

def obj_type(ref)
  self[ref].class.to_s.to_sym
rescue
  nil
end

#object(key) ⇒ Object Also known as: deref

If key is a PDF::Reader::Reference object, lookup the corresponding object in the PDF and return it. Otherwise return key untouched.

: (untyped) -> untyped



119
120
121
# File 'lib/pdf/reader/object_hash.rb', line 119

def object(key)
  key.is_a?(PDF::Reader::Reference) ? self[key] : key
end

#page_referencesObject

returns an array of PDF::Reader::References. Each reference in the array points a Page object, one for each page in the PDF. The first reference is page 1, second reference is page 2, etc.

Useful for apps that want to extract data from specific pages.

: () -> Array[PDF::Reader::Reference | Hash[Symbol, untyped]]



519
520
521
522
523
524
525
# File 'lib/pdf/reader/object_hash.rb', line 519

def page_references
  root  = fetch(trailer[:Root])
  @page_references ||= begin
                         pages_root = deref_hash(root[:Pages]) || {}
                         get_page_objects(pages_root)
                       end
end

#sec_handler?Boolean

: () -> bool

Returns:

  • (Boolean)


533
534
535
# File 'lib/pdf/reader/object_hash.rb', line 533

def sec_handler?
  !!sec_handler
end

#sizeObject Also known as: length

return the number of objects in the file. An object with multiple generations is counted once. : () -> Integer



428
429
430
# File 'lib/pdf/reader/object_hash.rb', line 428

def size
  xref.size
end

#stream?(ref) ⇒ Boolean

returns true if the supplied references points to an object with a stream : ((Integer | PDF::Reader::Reference)) -> bool

Returns:

  • (Boolean)


89
90
91
# File 'lib/pdf/reader/object_hash.rb', line 89

def stream?(ref)
  self.has_key?(ref) && self[ref].is_a?(PDF::Reader::Stream)
end

#to_aObject

return an array of arrays. Each sub array contains a key/value pair.

: () -> untyped



504
505
506
507
508
509
510
# File 'lib/pdf/reader/object_hash.rb', line 504

def to_a
  ret = []
  each do |id, obj|
    ret << [id, obj]
  end
  ret
end

#to_sObject

: () -> String



472
473
474
# File 'lib/pdf/reader/object_hash.rb', line 472

def to_s
  "<PDF::Reader::ObjectHash size: #{self.size}>"
end

#valuesObject

return an array of all values in the file

: () -> untyped



488
489
490
491
492
# File 'lib/pdf/reader/object_hash.rb', line 488

def values
  ret = []
  each_value { |v| ret << v }
  ret
end

#values_at(*ids) ⇒ Object

return an array of all values from the specified keys

: (*untyped) -> untyped



497
498
499
# File 'lib/pdf/reader/object_hash.rb', line 497

def values_at(*ids)
  ids.map { |id| self[id] }
end