Class: PDF::Reader::ObjectHash
- Inherits:
-
Object
- Object
- PDF::Reader::ObjectHash
- Includes:
- Enumerable
- Defined in:
- lib/pdf/reader/object_hash.rb
Overview
Provides low level access to the objects in a PDF file via a hash-like object.
A PDF file can be viewed as a large hash map. It is a series of objects stored at precise byte offsets, and a table that maps object IDs to byte offsets. Given an object ID, looking up an object is an O(1) operation.
Each PDF object can be mapped to a ruby object, so by passing an object ID to the [] method, a ruby representation of that object will be retrieved.
The class behaves much like a standard Ruby hash, including the use of the Enumerable mixin. The key difference is no []= method - the hash is read only.
Basic Usage
h = PDF::Reader::ObjectHash.new("somefile.pdf")
h[1]
=> 3469
h[PDF::Reader::Reference.new(1,0)]
=> 3469
Instance Attribute Summary collapse
-
#default ⇒ Object
: untyped.
-
#pdf_version ⇒ Object
readonly
: Float.
-
#sec_handler ⇒ Object
readonly
: securityHandler.
-
#trailer ⇒ Object
readonly
: Hash[Symbol, untyped].
Instance Method Summary collapse
-
#[](key) ⇒ Object
Access an object from the PDF.
-
#deref!(key) ⇒ Object
Recursively dereferences the object refered to be
key
. -
#deref_array(key) ⇒ Object
If key is a PDF::Reader::Reference object, lookup the corresponding object in the PDF and return it.
-
#deref_array!(key) ⇒ Object
: (untyped) -> Array?.
-
#deref_array_of_numbers(key) ⇒ Object
If key is a PDF::Reader::Reference object, lookup the corresponding object in the PDF and return it.
-
#deref_hash(key) ⇒ Object
If key is a PDF::Reader::Reference object, lookup the corresponding object in the PDF and return it.
-
#deref_hash!(key) ⇒ Object
: (untyped) -> Hash[Symbol, untyped]?.
-
#deref_integer(key) ⇒ Object
If key is a PDF::Reader::Reference object, lookup the corresponding object in the PDF and return it.
-
#deref_name(key) ⇒ Object
If key is a PDF::Reader::Reference object, lookup the corresponding object in the PDF and return it.
-
#deref_name_or_array(key) ⇒ Object
If key is a PDF::Reader::Reference object, lookup the corresponding object in the PDF and return it.
-
#deref_number(key) ⇒ Object
If key is a PDF::Reader::Reference object, lookup the corresponding object in the PDF and return it.
-
#deref_stream(key) ⇒ Object
If key is a PDF::Reader::Reference object, lookup the corresponding object in the PDF and return it.
-
#deref_stream_or_array(key) ⇒ Object
If key is a PDF::Reader::Reference object, lookup the corresponding object in the PDF and return it.
-
#deref_string(key) ⇒ Object
If key is a PDF::Reader::Reference object, lookup the corresponding object in the PDF and return it.
-
#each(&block) ⇒ Object
(also: #each_pair)
iterate over each key, value.
-
#each_key(&block) ⇒ Object
iterate over each key.
-
#each_value(&block) ⇒ Object
iterate over each value.
-
#empty? ⇒ Boolean
return true if there are no objects in this file.
-
#encrypted? ⇒ Boolean
: () -> bool.
-
#fetch(key, local_default = nil) ⇒ Object
Access an object from the PDF.
-
#has_key?(check_key) ⇒ Boolean
(also: #include?, #key?, #member?, #value?)
return true if the specified key exists in the file.
-
#has_value?(value) ⇒ Boolean
return true if the specifiedvalue exists in the file.
-
#initialize(input, opts = {}) ⇒ ObjectHash
constructor
Creates a new ObjectHash object.
-
#keys ⇒ Object
return an array of all keys in the file.
-
#obj_type(ref) ⇒ Object
returns the type of object a ref points to : ((Integer | PDF::Reader::Reference)) -> Symbol?.
-
#object(key) ⇒ Object
(also: #deref)
If key is a PDF::Reader::Reference object, lookup the corresponding object in the PDF and return it.
-
#page_references ⇒ Object
returns an array of PDF::Reader::References.
-
#sec_handler? ⇒ Boolean
: () -> bool.
-
#size ⇒ Object
(also: #length)
return the number of objects in the file.
-
#stream?(ref) ⇒ Boolean
returns true if the supplied references points to an object with a stream : ((Integer | PDF::Reader::Reference)) -> bool.
-
#to_a ⇒ Object
return an array of arrays.
-
#to_s ⇒ Object
: () -> String.
-
#values ⇒ Object
return an array of all values in the file.
-
#values_at(*ids) ⇒ Object
return an array of all values from the specified keys.
Constructor Details
#initialize(input, opts = {}) ⇒ ObjectHash
Creates a new ObjectHash object. Input can be a string with a valid filename or an IO-like object.
Valid options:
:password - the user password to decrypt the source PDF
: ((IO | Tempfile | StringIO | String), ?Hash[Symbol, untyped]) -> void
63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 |
# File 'lib/pdf/reader/object_hash.rb', line 63 def initialize(input, opts = {}) @io = extract_io_from(input) #: IO | Tempfile | StringIO @xref = PDF::Reader::XRef.new(@io) #: PDF::Reader::XRef[PDF::Reader::Reference] @pdf_version = read_version #: Float @trailer = @xref.trailer #: Hash[Symbol, untyped] @cache = opts[:cache] || PDF::Reader::ObjectCache.new #: PDF::Reader::ObjectCache @sec_handler = NullSecurityHandler.new #: securityHandler @sec_handler = SecurityHandlerFactory.build( deref(trailer[:Encrypt]), deref(trailer[:ID]), opts[:password] ) @page_references = nil #: Array[PDF::Reader::Reference | Hash[Symbol, untyped]]? @object_streams = nil #: Hash[PDF::Reader::Reference, PDF::Reader::ObjectStream]? end |
Instance Attribute Details
#default ⇒ Object
: untyped
44 45 46 |
# File 'lib/pdf/reader/object_hash.rb', line 44 def default @default end |
#pdf_version ⇒ Object (readonly)
: Float
50 51 52 |
# File 'lib/pdf/reader/object_hash.rb', line 50 def pdf_version @pdf_version end |
#sec_handler ⇒ Object (readonly)
: securityHandler
53 54 55 |
# File 'lib/pdf/reader/object_hash.rb', line 53 def sec_handler @sec_handler end |
#trailer ⇒ Object (readonly)
: Hash[Symbol, untyped]
47 48 49 |
# File 'lib/pdf/reader/object_hash.rb', line 47 def trailer @trailer end |
Instance Method Details
#[](key) ⇒ Object
Access an object from the PDF. key can be an int or a PDF::Reader::Reference object.
If an int is used, the object with that ID and a generation number of 0 will be returned.
If a PDF::Reader::Reference object is used the exact ID and generation number can be specified.
: ((Integer | PDF::Reader::Reference)) -> untyped
103 104 105 106 107 108 109 110 111 112 113 |
# File 'lib/pdf/reader/object_hash.rb', line 103 def [](key) return default if key.to_i <= 0 unless key.is_a?(PDF::Reader::Reference) key = PDF::Reader::Reference.new(key.to_i, 0) end @cache[key] ||= fetch_object(key) || fetch_object_stream(key) rescue InvalidObjectError return default end |
#deref!(key) ⇒ Object
Recursively dereferences the object refered to be key
. If key
is not a PDF::Reader::Reference, the key is returned unchanged.
: (untyped) -> untyped
350 351 352 |
# File 'lib/pdf/reader/object_hash.rb', line 350 def deref!(key) deref_internal!(key, {}) end |
#deref_array(key) ⇒ Object
If key is a PDF::Reader::Reference object, lookup the corresponding object in the PDF and return it. Otherwise return key untouched.
Guaranteed to only return an Array or nil. If the dereference results in any other type then a MalformedPDFError exception will raise. Useful when expecting an Array and no other type will do. : (untyped) -> Array?
131 132 133 134 135 136 137 138 139 |
# File 'lib/pdf/reader/object_hash.rb', line 131 def deref_array(key) obj = deref(key) return obj if obj.nil? obj.tap { |obj| raise MalformedPDFError, "expected object to be an Array or nil" if !obj.is_a?(Array) } end |
#deref_array!(key) ⇒ Object
: (untyped) -> Array?
355 356 357 358 359 360 361 |
# File 'lib/pdf/reader/object_hash.rb', line 355 def deref_array!(key) deref!(key).tap { |obj| if !obj.nil? && !obj.is_a?(Array) raise MalformedPDFError, "expected object (#{obj.inspect}) to be an Array or nil" end } end |
#deref_array_of_numbers(key) ⇒ Object
If key is a PDF::Reader::Reference object, lookup the corresponding object in the PDF and return it. Otherwise return key untouched.
Guaranteed to only return an Array of Numerics or nil. If the dereference results in any other type then a MalformedPDFError exception will raise. Useful when expecting an Array and no other type will do.
Some effort to cast array elements to a number is made for any non-numeric elements. : (untyped) -> Array?
150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 |
# File 'lib/pdf/reader/object_hash.rb', line 150 def deref_array_of_numbers(key) arr = deref(key) return arr if arr.nil? raise MalformedPDFError, "expected object to be an Array" unless arr.is_a?(Array) arr.map { |item| if item.is_a?(Numeric) item elsif item.respond_to?(:to_f) item.to_f elsif item.respond_to?(:to_i) item.to_i else raise MalformedPDFError, "expected object to be a number" end } end |
#deref_hash(key) ⇒ Object
If key is a PDF::Reader::Reference object, lookup the corresponding object in the PDF and return it. Otherwise return key untouched.
Guaranteed to only return a Hash or nil. If the dereference results in any other type then a MalformedPDFError exception will raise. Useful when expecting an Array and no other type will do. : (untyped) -> Hash[Symbol, untyped]?
177 178 179 180 181 182 183 184 185 |
# File 'lib/pdf/reader/object_hash.rb', line 177 def deref_hash(key) obj = deref(key) return obj if obj.nil? obj.tap { |obj| raise MalformedPDFError, "expected object to be a Hash or nil" if !obj.is_a?(Hash) } end |
#deref_hash!(key) ⇒ Object
: (untyped) -> Hash[Symbol, untyped]?
364 365 366 367 368 369 370 |
# File 'lib/pdf/reader/object_hash.rb', line 364 def deref_hash!(key) deref!(key).tap { |obj| if !obj.nil? && !obj.is_a?(Hash) raise MalformedPDFError, "expected object (#{obj.inspect}) to be a Hash or nil" end } end |
#deref_integer(key) ⇒ Object
If key is a PDF::Reader::Reference object, lookup the corresponding object in the PDF and return it. Otherwise return key untouched.
Guaranteed to only return an Integer or nil. If the dereference results in any other type then a MalformedPDFError exception will raise. Useful when expecting an Array and no other type will do.
Some effort to cast to an int is made when the reference points to a non-integer. : (untyped) -> Integer?
221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 |
# File 'lib/pdf/reader/object_hash.rb', line 221 def deref_integer(key) obj = deref(key) return obj if obj.nil? if !obj.is_a?(Integer) if obj.respond_to?(:to_i) obj = obj.to_i else raise MalformedPDFError, "expected object to be an Integer" end end obj end |
#deref_name(key) ⇒ Object
If key is a PDF::Reader::Reference object, lookup the corresponding object in the PDF and return it. Otherwise return key untouched.
Guaranteed to only return a PDF name (Symbol) or nil. If the dereference results in any other type then a MalformedPDFError exception will raise. Useful when expecting an Array and no other type will do.
Some effort to cast to a symbol is made when the reference points to a non-symbol. : (untyped) -> Symbol?
196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 |
# File 'lib/pdf/reader/object_hash.rb', line 196 def deref_name(key) obj = deref(key) return obj if obj.nil? if !obj.is_a?(Symbol) if obj.respond_to?(:to_sym) obj = obj.to_sym else raise MalformedPDFError, "expected object to be a Name" end end obj end |
#deref_name_or_array(key) ⇒ Object
If key is a PDF::Reader::Reference object, lookup the corresponding object in the PDF and return it. Otherwise return key untouched.
Guaranteed to only return a PDF Name (symbol), Array or nil. If the dereference results in any other type then a MalformedPDFError exception will raise. Useful when expecting a Name or Array and no other type will do. : (untyped) -> (Symbol | Array | nil)
315 316 317 318 319 320 321 322 323 324 325 |
# File 'lib/pdf/reader/object_hash.rb', line 315 def deref_name_or_array(key) obj = deref(key) return obj if obj.nil? obj.tap { |obj| if !obj.is_a?(Symbol) && !obj.is_a?(Array) raise MalformedPDFError, "expected object to be an Array or Name" end } end |
#deref_number(key) ⇒ Object
If key is a PDF::Reader::Reference object, lookup the corresponding object in the PDF and return it. Otherwise return key untouched.
Guaranteed to only return a Numeric or nil. If the dereference results in any other type then a MalformedPDFError exception will raise. Useful when expecting an Array and no other type will do.
Some effort to cast to a number is made when the reference points to a non-number. : (untyped) -> Numeric?
246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 |
# File 'lib/pdf/reader/object_hash.rb', line 246 def deref_number(key) obj = deref(key) return obj if obj.nil? if !obj.is_a?(Numeric) if obj.respond_to?(:to_f) obj = obj.to_f elsif obj.respond_to?(:to_i) obj.to_i else raise MalformedPDFError, "expected object to be a number" end end obj end |
#deref_stream(key) ⇒ Object
If key is a PDF::Reader::Reference object, lookup the corresponding object in the PDF and return it. Otherwise return key untouched.
Guaranteed to only return a PDF::Reader::Stream or nil. If the dereference results in any other type then a MalformedPDFError exception will raise. Useful when expecting a stream and no other type will do. : (untyped) -> PDF::Reader::Stream?
271 272 273 274 275 276 277 278 279 280 281 |
# File 'lib/pdf/reader/object_hash.rb', line 271 def deref_stream(key) obj = deref(key) return obj if obj.nil? obj.tap { |obj| if !obj.is_a?(PDF::Reader::Stream) raise MalformedPDFError, "expected object to be a Stream or nil" end } end |
#deref_stream_or_array(key) ⇒ Object
If key is a PDF::Reader::Reference object, lookup the corresponding object in the PDF and return it. Otherwise return key untouched.
Guaranteed to only return a PDF::Reader::Stream, Array or nil. If the dereference results in any other type then a MalformedPDFError exception will raise. Useful when expecting a stream or Array and no other type will do. : (untyped) -> (PDF::Reader::Stream | Array | nil)
334 335 336 337 338 339 340 341 342 343 344 |
# File 'lib/pdf/reader/object_hash.rb', line 334 def deref_stream_or_array(key) obj = deref(key) return obj if obj.nil? obj.tap { |obj| if !obj.is_a?(PDF::Reader::Stream) && !obj.is_a?(Array) raise MalformedPDFError, "expected object to be an Array or Stream" end } end |
#deref_string(key) ⇒ Object
If key is a PDF::Reader::Reference object, lookup the corresponding object in the PDF and return it. Otherwise return key untouched.
Guaranteed to only return a String or nil. If the dereference results in any other type then a MalformedPDFError exception will raise. Useful when expecting a string and no other type will do.
Some effort to cast to a string is made when the reference points to a non-string. : (untyped) -> String?
292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 |
# File 'lib/pdf/reader/object_hash.rb', line 292 def deref_string(key) obj = deref(key) return obj if obj.nil? if !obj.is_a?(String) if obj.respond_to?(:to_s) obj = obj.to_s else raise MalformedPDFError, "expected object to be a string" end end obj end |
#each(&block) ⇒ Object Also known as: each_pair
iterate over each key, value. Just like a ruby hash.
@override(allow_incompatible: true) : () { (PDF::Reader::Reference, untyped) -> untyped } -> untyped
400 401 402 403 404 |
# File 'lib/pdf/reader/object_hash.rb', line 400 def each(&block) @xref.each do |ref| yield ref, self[ref] end end |
#each_key(&block) ⇒ Object
iterate over each key. Just like a ruby hash.
: { (PDF::Reader::Reference) -> untyped } -> untyped
410 411 412 413 414 |
# File 'lib/pdf/reader/object_hash.rb', line 410 def each_key(&block) each do |id, obj| yield id end end |
#each_value(&block) ⇒ Object
iterate over each value. Just like a ruby hash.
: { (untyped) -> untyped } -> untyped
419 420 421 422 423 |
# File 'lib/pdf/reader/object_hash.rb', line 419 def each_value(&block) each do |id, obj| yield obj end end |
#empty? ⇒ Boolean
return true if there are no objects in this file
: () -> bool
436 437 438 |
# File 'lib/pdf/reader/object_hash.rb', line 436 def empty? size == 0 ? true : false end |
#encrypted? ⇒ Boolean
: () -> bool
528 529 530 |
# File 'lib/pdf/reader/object_hash.rb', line 528 def encrypted? trailer.has_key?(:Encrypt) end |
#fetch(key, local_default = nil) ⇒ Object
Access an object from the PDF. key can be an int or a PDF::Reader::Reference object.
If an int is used, the object with that ID and a generation number of 0 will be returned.
If a PDF::Reader::Reference object is used the exact ID and generation number can be specified.
local_default is the object that will be returned if the requested key doesn’t exist.
: (untyped, ?untyped) -> untyped
385 386 387 388 389 390 391 392 393 394 |
# File 'lib/pdf/reader/object_hash.rb', line 385 def fetch(key, local_default = nil) obj = self[key] if obj return obj elsif local_default return local_default else raise IndexError, "#{key} is invalid" if key.to_i <= 0 end end |
#has_key?(check_key) ⇒ Boolean Also known as: include?, key?, member?, value?
return true if the specified key exists in the file. key can be an int or a PDF::Reader::Reference
: (untyped) -> bool
444 445 446 447 448 449 450 451 452 453 454 |
# File 'lib/pdf/reader/object_hash.rb', line 444 def has_key?(check_key) # TODO update from O(n) to O(1) each_key do |key| if check_key.kind_of?(PDF::Reader::Reference) return true if check_key == key else return true if check_key.to_i == key.id end end return false end |
#has_value?(value) ⇒ Boolean
return true if the specifiedvalue exists in the file
: (untyped) -> bool
462 463 464 465 466 467 468 |
# File 'lib/pdf/reader/object_hash.rb', line 462 def has_value?(value) # TODO update from O(n) to O(1) each_value do |obj| return true if obj == value end return false end |
#keys ⇒ Object
return an array of all keys in the file
: () -> Array
479 480 481 482 483 |
# File 'lib/pdf/reader/object_hash.rb', line 479 def keys ret = [] each_key { |k| ret << k } ret end |
#obj_type(ref) ⇒ Object
returns the type of object a ref points to : ((Integer | PDF::Reader::Reference)) -> Symbol?
81 82 83 84 85 |
# File 'lib/pdf/reader/object_hash.rb', line 81 def obj_type(ref) self[ref].class.to_s.to_sym rescue nil end |
#object(key) ⇒ Object Also known as: deref
If key is a PDF::Reader::Reference object, lookup the corresponding object in the PDF and return it. Otherwise return key untouched.
: (untyped) -> untyped
119 120 121 |
# File 'lib/pdf/reader/object_hash.rb', line 119 def object(key) key.is_a?(PDF::Reader::Reference) ? self[key] : key end |
#page_references ⇒ Object
returns an array of PDF::Reader::References. Each reference in the array points a Page object, one for each page in the PDF. The first reference is page 1, second reference is page 2, etc.
Useful for apps that want to extract data from specific pages.
: () -> Array[PDF::Reader::Reference | Hash[Symbol, untyped]]
519 520 521 522 523 524 525 |
# File 'lib/pdf/reader/object_hash.rb', line 519 def page_references root = fetch(trailer[:Root]) @page_references ||= begin pages_root = deref_hash(root[:Pages]) || {} get_page_objects(pages_root) end end |
#sec_handler? ⇒ Boolean
: () -> bool
533 534 535 |
# File 'lib/pdf/reader/object_hash.rb', line 533 def sec_handler? !!sec_handler end |
#size ⇒ Object Also known as: length
return the number of objects in the file. An object with multiple generations is counted once. : () -> Integer
428 429 430 |
# File 'lib/pdf/reader/object_hash.rb', line 428 def size xref.size end |
#stream?(ref) ⇒ Boolean
returns true if the supplied references points to an object with a stream : ((Integer | PDF::Reader::Reference)) -> bool
89 90 91 |
# File 'lib/pdf/reader/object_hash.rb', line 89 def stream?(ref) self.has_key?(ref) && self[ref].is_a?(PDF::Reader::Stream) end |
#to_a ⇒ Object
return an array of arrays. Each sub array contains a key/value pair.
: () -> untyped
504 505 506 507 508 509 510 |
# File 'lib/pdf/reader/object_hash.rb', line 504 def to_a ret = [] each do |id, obj| ret << [id, obj] end ret end |
#to_s ⇒ Object
: () -> String
472 473 474 |
# File 'lib/pdf/reader/object_hash.rb', line 472 def to_s "<PDF::Reader::ObjectHash size: #{self.size}>" end |
#values ⇒ Object
return an array of all values in the file
: () -> untyped
488 489 490 491 492 |
# File 'lib/pdf/reader/object_hash.rb', line 488 def values ret = [] each_value { |v| ret << v } ret end |
#values_at(*ids) ⇒ Object
return an array of all values from the specified keys
: (*untyped) -> untyped
497 498 499 |
# File 'lib/pdf/reader/object_hash.rb', line 497 def values_at(*ids) ids.map { |id| self[id] } end |