Class: PDF::Reader::ObjectHash
- Inherits:
-
Object
- Object
- PDF::Reader::ObjectHash
- Includes:
- Enumerable
- Defined in:
- lib/pdf/reader/object_hash.rb
Overview
Provides low level access to the objects in a PDF file via a hash-like object.
A PDF file can be viewed as a large hash map. It is a series of objects stored at precise byte offsets, and a table that maps object IDs to byte offsets. Given an object ID, looking up an object is an O(1) operation.
Each PDF object can be mapped to a ruby object, so by passing an object ID to the [] method, a ruby representation of that object will be retrieved.
The class behaves much like a standard Ruby hash, including the use of the Enumerable mixin. The key difference is no []= method - the hash is read only.
Basic Usage
h = PDF::Reader::ObjectHash.new("somefile.pdf")
h[1]
=> 3469
h[PDF::Reader::Reference.new(1,0)]
=> 3469
Direct Known Subclasses
Instance Attribute Summary collapse
-
#default ⇒ Object
Returns the value of attribute default.
-
#pdf_version ⇒ Object
readonly
Returns the value of attribute pdf_version.
-
#sec_handler ⇒ Object
readonly
Returns the value of attribute sec_handler.
-
#trailer ⇒ Object
readonly
Returns the value of attribute trailer.
Instance Method Summary collapse
-
#[](key) ⇒ Object
Access an object from the PDF.
-
#deref!(key) ⇒ Object
Recursively dereferences the object refered to be
key
. -
#each(&block) ⇒ Object
(also: #each_pair)
iterate over each key, value.
-
#each_key(&block) ⇒ Object
iterate over each key.
-
#each_value(&block) ⇒ Object
iterate over each value.
-
#empty? ⇒ Boolean
return true if there are no objects in this file.
- #encrypted? ⇒ Boolean
-
#fetch(key, local_default = nil) ⇒ Object
Access an object from the PDF.
-
#has_key?(check_key) ⇒ Boolean
(also: #include?, #key?, #member?, #value?)
return true if the specified key exists in the file.
-
#has_value?(value) ⇒ Boolean
return true if the specifiedvalue exists in the file.
-
#initialize(input, opts = {}) ⇒ ObjectHash
constructor
Creates a new ObjectHash object.
-
#keys ⇒ Object
return an array of all keys in the file.
-
#obj_type(ref) ⇒ Object
returns the type of object a ref points to.
-
#object(key) ⇒ Object
(also: #deref)
If key is a PDF::Reader::Reference object, lookup the corresponding object in the PDF and return it.
-
#page_references ⇒ Object
returns an array of PDF::Reader::References.
- #sec_handler? ⇒ Boolean
-
#size ⇒ Object
(also: #length)
return the number of objects in the file.
-
#stream?(ref) ⇒ Boolean
returns true if the supplied references points to an object with a stream.
-
#to_a ⇒ Object
return an array of arrays.
- #to_s ⇒ Object
-
#values ⇒ Object
return an array of all values in the file.
-
#values_at(*ids) ⇒ Object
return an array of all values from the specified keys.
Constructor Details
#initialize(input, opts = {}) ⇒ ObjectHash
Creates a new ObjectHash object. Input can be a string with a valid filename or an IO-like object.
Valid options:
:password - the user password to decrypt the source PDF
42 43 44 45 46 47 48 49 |
# File 'lib/pdf/reader/object_hash.rb', line 42 def initialize(input, opts = {}) @io = extract_io_from(input) @xref = PDF::Reader::XRef.new(@io) @pdf_version = read_version @trailer = @xref.trailer @cache = opts[:cache] || PDF::Reader::ObjectCache.new @sec_handler = build_security_handler(opts) end |
Instance Attribute Details
#default ⇒ Object
Returns the value of attribute default.
31 32 33 |
# File 'lib/pdf/reader/object_hash.rb', line 31 def default @default end |
#pdf_version ⇒ Object (readonly)
Returns the value of attribute pdf_version.
32 33 34 |
# File 'lib/pdf/reader/object_hash.rb', line 32 def pdf_version @pdf_version end |
#sec_handler ⇒ Object (readonly)
Returns the value of attribute sec_handler.
33 34 35 |
# File 'lib/pdf/reader/object_hash.rb', line 33 def sec_handler @sec_handler end |
#trailer ⇒ Object (readonly)
Returns the value of attribute trailer.
32 33 34 |
# File 'lib/pdf/reader/object_hash.rb', line 32 def trailer @trailer end |
Instance Method Details
#[](key) ⇒ Object
Access an object from the PDF. key can be an int or a PDF::Reader::Reference object.
If an int is used, the object with that ID and a generation number of 0 will be returned.
If a PDF::Reader::Reference object is used the exact ID and generation number can be specified.
72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 |
# File 'lib/pdf/reader/object_hash.rb', line 72 def [](key) return default if key.to_i <= 0 unless key.is_a?(PDF::Reader::Reference) key = PDF::Reader::Reference.new(key.to_i, 0) end if @cache.has_key?(key) @cache[key] elsif xref[key].is_a?(Fixnum) buf = new_buffer(xref[key]) @cache[key] = decrypt(key, Parser.new(buf, self).object(key.id, key.gen)) elsif xref[key].is_a?(PDF::Reader::Reference) container_key = xref[key] object_streams[container_key] ||= PDF::Reader::ObjectStream.new(object(container_key)) @cache[key] = object_streams[container_key][key.id] end rescue InvalidObjectError return default end |
#deref!(key) ⇒ Object
Recursively dereferences the object refered to be key
. If key
is not a PDF::Reader::Reference, the key is returned unchanged.
104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 |
# File 'lib/pdf/reader/object_hash.rb', line 104 def deref!(key) case object = deref(key) when Hash {}.tap { |hash| object.each do |k, value| hash[k] = deref!(value) end } when PDF::Reader::Stream object.hash = deref!(object.hash) object when Array object.map { |value| deref!(value) } else object end end |
#each(&block) ⇒ Object Also known as: each_pair
iterate over each key, value. Just like a ruby hash.
147 148 149 150 151 |
# File 'lib/pdf/reader/object_hash.rb', line 147 def each(&block) @xref.each do |ref| yield ref, self[ref] end end |
#each_key(&block) ⇒ Object
iterate over each key. Just like a ruby hash.
156 157 158 159 160 |
# File 'lib/pdf/reader/object_hash.rb', line 156 def each_key(&block) each do |id, obj| yield id end end |
#each_value(&block) ⇒ Object
iterate over each value. Just like a ruby hash.
164 165 166 167 168 |
# File 'lib/pdf/reader/object_hash.rb', line 164 def each_value(&block) each do |id, obj| yield obj end end |
#empty? ⇒ Boolean
return true if there are no objects in this file
179 180 181 |
# File 'lib/pdf/reader/object_hash.rb', line 179 def empty? size == 0 ? true : false end |
#encrypted? ⇒ Boolean
259 260 261 |
# File 'lib/pdf/reader/object_hash.rb', line 259 def encrypted? trailer.has_key?(:Encrypt) end |
#fetch(key, local_default = nil) ⇒ Object
Access an object from the PDF. key can be an int or a PDF::Reader::Reference object.
If an int is used, the object with that ID and a generation number of 0 will be returned.
If a PDF::Reader::Reference object is used the exact ID and generation number can be specified.
local_default is the object that will be returned if the requested key doesn’t exist.
134 135 136 137 138 139 140 141 142 143 |
# File 'lib/pdf/reader/object_hash.rb', line 134 def fetch(key, local_default = nil) obj = self[key] if obj return obj elsif local_default return local_default else raise IndexError, "#{key} is invalid" if key.to_i <= 0 end end |
#has_key?(check_key) ⇒ Boolean Also known as: include?, key?, member?, value?
return true if the specified key exists in the file. key can be an int or a PDF::Reader::Reference
186 187 188 189 190 191 192 193 194 195 196 |
# File 'lib/pdf/reader/object_hash.rb', line 186 def has_key?(check_key) # TODO update from O(n) to O(1) each_key do |key| if check_key.kind_of?(PDF::Reader::Reference) return true if check_key == key else return true if check_key.to_i == key.id end end return false end |
#has_value?(value) ⇒ Boolean
return true if the specifiedvalue exists in the file
203 204 205 206 207 208 209 |
# File 'lib/pdf/reader/object_hash.rb', line 203 def has_value?(value) # TODO update from O(n) to O(1) each_value do |obj| return true if obj == value end return false end |
#keys ⇒ Object
return an array of all keys in the file
218 219 220 221 222 |
# File 'lib/pdf/reader/object_hash.rb', line 218 def keys ret = [] each_key { |k| ret << k } ret end |
#obj_type(ref) ⇒ Object
returns the type of object a ref points to
52 53 54 55 56 |
# File 'lib/pdf/reader/object_hash.rb', line 52 def obj_type(ref) self[ref].class.to_s.to_sym rescue nil end |
#object(key) ⇒ Object Also known as: deref
If key is a PDF::Reader::Reference object, lookup the corresponding object in the PDF and return it. Otherwise return key untouched.
96 97 98 |
# File 'lib/pdf/reader/object_hash.rb', line 96 def object(key) key.is_a?(PDF::Reader::Reference) ? self[key] : key end |
#page_references ⇒ Object
returns an array of PDF::Reader::References. Each reference in the array points a Page object, one for each page in the PDF. The first reference is page 1, second reference is page 2, etc.
Useful for apps that want to extract data from specific pages.
254 255 256 257 |
# File 'lib/pdf/reader/object_hash.rb', line 254 def page_references root = fetch(trailer[:Root]) @page_references ||= get_page_objects(root[:Pages]).flatten end |
#sec_handler? ⇒ Boolean
263 264 265 |
# File 'lib/pdf/reader/object_hash.rb', line 263 def sec_handler? !!sec_handler end |
#size ⇒ Object Also known as: length
return the number of objects in the file. An object with multiple generations is counted once.
172 173 174 |
# File 'lib/pdf/reader/object_hash.rb', line 172 def size xref.size end |
#stream?(ref) ⇒ Boolean
returns true if the supplied references points to an object with a stream
59 60 61 |
# File 'lib/pdf/reader/object_hash.rb', line 59 def stream?(ref) self.has_key?(ref) && self[ref].is_a?(PDF::Reader::Stream) end |
#to_a ⇒ Object
return an array of arrays. Each sub array contains a key/value pair.
240 241 242 243 244 245 246 |
# File 'lib/pdf/reader/object_hash.rb', line 240 def to_a ret = [] each do |id, obj| ret << [id, obj] end ret end |
#to_s ⇒ Object
212 213 214 |
# File 'lib/pdf/reader/object_hash.rb', line 212 def to_s "<PDF::Reader::ObjectHash size: #{self.size}>" end |
#values ⇒ Object
return an array of all values in the file
226 227 228 229 230 |
# File 'lib/pdf/reader/object_hash.rb', line 226 def values ret = [] each_value { |v| ret << v } ret end |
#values_at(*ids) ⇒ Object
return an array of all values from the specified keys
234 235 236 |
# File 'lib/pdf/reader/object_hash.rb', line 234 def values_at(*ids) ids.map { |id| self[id] } end |