Module: HexaPDF::Task::Optimize
- Defined in:
- lib/hexapdf/task/optimize.rb
Overview
Task for optimizing the PDF document.
For a list of optimization methods this task can perform have a look at the ::call method.
Defined Under Namespace
Classes: SerializationProcessor
Class Method Summary collapse
-
.call(doc, compact: false, object_streams: :preserve, xref_streams: :preserve, compress_pages: false, prune_page_resources: false) ⇒ Object
Optimizes the PDF document.
-
.compact(doc, object_streams, xref_streams) ⇒ Object
Compacts the document by merging all revisions into one, deleting null and unused entries and renumbering the objects.
-
.compress_pages(doc) ⇒ Object
Compresses the contents of all pages by parsing and then serializing again.
-
.delete_fields_with_defaults(obj) ⇒ Object
Deletes field entries (except for /Type) of the object that are optional and currently set to their default value.
-
.process_object_streams(doc, method, xref_streams) ⇒ Object
Processes the object streams in each revision according to method: For :preserve, nothing is done, for :delete all object streams are deleted and for :generate objects are packed into object streams as much as possible.
-
.process_xref_streams(doc, method) ⇒ Object
Processes the cross-reference streams in each revision according to method: For :preserve, nothing is done, for :delete all cross-reference streams are deleted and for :generate cross-reference streams are added.
-
.prune_page_resources(doc, used_refs) ⇒ Object
Deletes all XObject entries from the resources dictionaries of all pages whose names do not match the keys in
used_refs
.
Class Method Details
.call(doc, compact: false, object_streams: :preserve, xref_streams: :preserve, compress_pages: false, prune_page_resources: false) ⇒ Object
Optimizes the PDF document.
The field entries that are optional and set to their default value are always deleted. Additional optimization methods are performed depending on the values of the following arguments:
- compact
-
Compacts the object space by merging the revisions and then deleting null and unused values if set to
true
. - object_streams
-
Specifies if and how object streams should be used: For :preserve, existing object streams are preserved; for :generate objects are packed into object streams as much as possible; and for :delete existing object streams are deleted.
- xref_streams
-
Specifies if cross-reference streams should be used. Can be :preserve (no modifications), :generate (use cross-reference streams) or :delete (remove cross-reference streams).
If
object_streams
is set to :generate, this option is implicitly changed to :generate. - compress_pages
-
Compresses the content streams of all pages if set to
true
. Note that this can take a very long time because each content stream has to be unfiltered, parsed, serialized and then filtered again. - prune_page_resources
-
Removes all unused XObjects from the resources dictionaries of all pages. It is recommended to also set the
compact
argument because otherwise the unused XObjects won’t be deleted from the document.This is sometimes necessary after importing pages from other PDF files that use a single resources dictionary for all pages.
85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 |
# File 'lib/hexapdf/task/optimize.rb', line 85 def self.call(doc, compact: false, object_streams: :preserve, xref_streams: :preserve, compress_pages: false, prune_page_resources: false) used_refs = compress_pages(doc) if compress_pages prune_page_resources(doc, used_refs) if prune_page_resources if compact compact(doc, object_streams, xref_streams) elsif object_streams != :preserve process_object_streams(doc, object_streams, xref_streams) elsif xref_streams != :preserve process_xref_streams(doc, xref_streams) else doc.each(&method(:delete_fields_with_defaults)) end end |
.compact(doc, object_streams, xref_streams) ⇒ Object
Compacts the document by merging all revisions into one, deleting null and unused entries and renumbering the objects.
For the meaning of the other arguments see ::call.
105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 |
# File 'lib/hexapdf/task/optimize.rb', line 105 def self.compact(doc, object_streams, xref_streams) doc.revisions.merge unused = Set.new(doc.task(:dereference)) rev = doc.revisions.add oid = 1 doc.revisions.all[0].each do |obj| if obj.null? || unused.include?(obj) || (obj.type == :ObjStm) || (obj.type == :XRef && xref_streams != :preserve) obj.data.value = nil next end delete_fields_with_defaults(obj) obj.oid = oid obj.gen = 0 rev.add(obj) oid += 1 end doc.revisions.all.delete_at(0) if object_streams == :generate process_object_streams(doc, :generate, xref_streams) elsif xref_streams == :generate doc.add({}, type: Type::XRefStream) end end |
.compress_pages(doc) ⇒ Object
Compresses the contents of all pages by parsing and then serializing again. The HexaPDF serializer is already optimized for small output size so nothing else needs to be done.
Returns a hash of the form key=>true where the keys are the used XObjects (for use with #prune_page_resources).
234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 |
# File 'lib/hexapdf/task/optimize.rb', line 234 def self.compress_pages(doc) used_refs = {} doc.pages.each do |page| processor = SerializationProcessor.new do || doc.config['parser.on_correctable_error'].call(doc, , 0) && raise(HexaPDF::Error, ) end HexaPDF::Content::Parser.parse(page.contents, processor) page.contents = processor.result page[:Contents].set_filter(:FlateDecode) xobjects = page.resources[:XObject] processor.used_references.each {|ref| used_refs[xobjects[ref]] = true } if xobjects end used_refs end |
.delete_fields_with_defaults(obj) ⇒ Object
Deletes field entries (except for /Type) of the object that are optional and currently set to their default value.
219 220 221 222 223 224 225 226 227 |
# File 'lib/hexapdf/task/optimize.rb', line 219 def self.delete_fields_with_defaults(obj) return unless obj.kind_of?(HexaPDF::Dictionary) && !obj.null? obj.each do |name, value| if name != :Type && (field = obj.class.field(name)) && !field.required? && field.default? && value == field.default obj.delete(name) end end end |
.process_object_streams(doc, method, xref_streams) ⇒ Object
Processes the object streams in each revision according to method: For :preserve, nothing is done, for :delete all object streams are deleted and for :generate objects are packed into object streams as much as possible.
136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 |
# File 'lib/hexapdf/task/optimize.rb', line 136 def self.process_object_streams(doc, method, xref_streams) case method when :delete doc.revisions.each do |rev| xref_stream = false objects_to_delete = [] rev.each do |obj| case obj.type when :ObjStm objects_to_delete << obj when :XRef xref_stream = true objects_to_delete << obj if xref_streams == :delete else delete_fields_with_defaults(obj) end end objects_to_delete.each {|obj| rev.delete(obj) } if xref_streams == :generate && !xref_stream rev.add(doc.wrap({}, type: Type::XRefStream, oid: doc.revisions.next_oid)) end end when :generate doc.revisions.each do |rev| xref_stream = false count = 0 objstms = [doc.wrap({}, type: Type::ObjectStream)] old_objstms = [] rev.each do |obj| case obj.type when :XRef xref_stream = true when :ObjStm old_objstms << obj end delete_fields_with_defaults(obj) next if obj.respond_to?(:stream) objstms[-1].add_object(obj) count += 1 if count == 200 objstms << doc.wrap({}, type: Type::ObjectStream) count = 0 end end old_objstms.each {|objstm| rev.delete(objstm) } objstms.each do |objstm| objstm.data.oid = doc.revisions.next_oid rev.add(objstm) end rev.add(doc.wrap({}, type: Type::XRefStream, oid: doc.revisions.next_oid)) unless xref_stream end end end |
.process_xref_streams(doc, method) ⇒ Object
Processes the cross-reference streams in each revision according to method: For :preserve, nothing is done, for :delete all cross-reference streams are deleted and for :generate cross-reference streams are added.
195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 |
# File 'lib/hexapdf/task/optimize.rb', line 195 def self.process_xref_streams(doc, method) case method when :delete doc.each do |obj, rev| if obj.type == :XRef rev.delete(obj) else delete_fields_with_defaults(obj) end end when :generate doc.revisions.each do |rev| xref_stream = false rev.each do |obj| xref_stream = true if obj.type == :XRef delete_fields_with_defaults(obj) end rev.add(doc.wrap({}, type: Type::XRefStream, oid: doc.revisions.next_oid)) unless xref_stream end end end |
.prune_page_resources(doc, used_refs) ⇒ Object
Deletes all XObject entries from the resources dictionaries of all pages whose names do not match the keys in used_refs
.
252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 |
# File 'lib/hexapdf/task/optimize.rb', line 252 def self.prune_page_resources(doc, used_refs) unless used_refs used_refs = {} doc.pages.each do |page| next unless (xobjects = page.resources[:XObject]) HexaPDF::Content::Parser.parse(page.contents) do |op, operands| used_refs[xobjects[operands[0]]] = true if op == :Do end end end doc.pages.each do |page| next unless (xobjects = page.resources[:XObject]) xobjects.each do |key, obj| next if used_refs[obj] xobjects.delete(key) end end end |