Class: HexaPDF::Content::Parser

Inherits:
Object
  • Object
show all
Defined in:
lib/hexapdf/content/parser.rb

Overview

This class knows how to correctly parse a content stream.

Overview

A content stream is mostly just a stream of PDF objects. However, there is one exception: inline images.

Since inline images don’t follow the normal PDF object parsing rules, they need to be handled specially and this is the reason for this class. Therefore only the BI operator is ever called for inline images because the ID and EI operators are handled by the parser.

To parse some contents the #parse method needs to be called with the contents to be parsed and a Processor object which is used for processing the parsed operators.

Class Method Summary collapse

Instance Method Summary collapse

Class Method Details

.parse(contents, processor = nil, &block) ⇒ Object

Creates a new Parser object and calls #parse.



164
165
166
# File 'lib/hexapdf/content/parser.rb', line 164

def self.parse(contents, processor = nil, &block)
  new.parse(contents, processor, &block)
end

Instance Method Details

#parse(contents, processor = nil, &block) ⇒ Object

Parses the contents and calls the processor object or the given block for each parsed operator.

If a full-blown Processor is not needed (e.g. because the graphics state doesn’t need to be maintained), one can use the block form to handle the parsed objects and their parameters.

Note: The parameters array is reused for each processed operator, so duplicate it if necessary.

Raises:

  • (ArgumentError)


176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
# File 'lib/hexapdf/content/parser.rb', line 176

def parse(contents, processor = nil, &block) #:yields: object, params
  raise ArgumentError, "Argument processor or block is needed" if processor.nil? && block.nil?
  if processor.nil?
    block.singleton_class.send(:alias_method, :process, :call)
    processor = block
  end

  tokenizer = Tokenizer.new(contents, raise_on_eos: true)
  params = []
  loop do
    obj = tokenizer.next_object(allow_keyword: true)
    if obj.kind_of?(Tokenizer::Token)
      if obj == 'BI'
        params = parse_inline_image(tokenizer)
      end
      processor.process(obj.to_sym, params)
      params.clear
    else
      params << obj
    end
  end
end