Class: PDF::Reader::Parser

Inherits:

Object

Object
PDF::Reader::Parser

show all

Defined in:: lib/pdf/reader/parser.rb

Overview

An internal PDF::Reader class that reads objects from the PDF file and converts them into useable ruby objects (hash’s, arrays, true, false, etc)

Constant Summary collapse

TOKEN_STRATEGY =

proc { |parser, token| Token.new(token) }

STRATEGIES =

{
  "/"  => proc { |parser, token| parser.send(:pdf_name) },
  "<<" => proc { |parser, token| parser.send(:dictionary) },
  "["  => proc { |parser, token| parser.send(:array) },
  "("  => proc { |parser, token| parser.send(:string) },
  "<"  => proc { |parser, token| parser.send(:hex_string) },

  nil     => proc { nil },
  "true"  => proc { true },
  "false" => proc { false },
  "null"  => proc { nil },

  "obj"       => TOKEN_STRATEGY,
  "endobj"    => TOKEN_STRATEGY,
  "stream"    => TOKEN_STRATEGY,
  "endstream" => TOKEN_STRATEGY,
  ">>"        => TOKEN_STRATEGY,
  "]"         => TOKEN_STRATEGY,
  ">"         => TOKEN_STRATEGY,
  ")"         => TOKEN_STRATEGY
}

Instance Method Summary collapse

#initialize(buffer, objects = nil) ⇒ Parser constructor

Create a new parser around a PDF::Reader::Buffer object.
#object(id, gen) ⇒ Object

Reads an entire PDF object from the buffer and returns it as a Ruby String.
#parse_token(operators = {}) ⇒ Object

Reads the next token from the underlying buffer and convets it to an appropriate object.

Constructor Details

#initialize(buffer, objects = nil) ⇒ `Parser`

Create a new parser around a PDF::Reader::Buffer object

buffer - a PDF::Reader::Buffer object that contains PDF data objects - a PDF::Reader::ObjectHash object that can return objects from the PDF file

# File 'lib/pdf/reader/parser.rb', line 65

def initialize(buffer, objects=nil)
  @buffer = buffer
  @objects  = objects
end

Instance Method Details

#object(id, gen) ⇒ `Object`

Reads an entire PDF object from the buffer and returns it as a Ruby String. If the object is a content stream, returns both the stream and the dictionary that describes it

id - the object ID to return gen - the object revision number to return

# File 'lib/pdf/reader/parser.rb', line 98

def object(id, gen)
  idCheck = parse_token

  # Sometimes the xref table is corrupt and points to an offset slightly too early in the file.
  # check the next token, maybe we can find the start of the object we're looking for
  if idCheck != id
    Error.assert_equal(parse_token, id)
  end
  Error.assert_equal(parse_token, gen)
  Error.str_assert(parse_token, "obj")

  obj = parse_token
  post_obj = parse_token

  if obj.is_a?(Hash) && post_obj == "stream"
    stream(obj)
  else
    obj
  end
end

#parse_token(operators = {}) ⇒ `Object`

Reads the next token from the underlying buffer and convets it to an appropriate object

operators - a hash of supported operators to read from the underlying buffer.

# File 'lib/pdf/reader/parser.rb', line 74

def parse_token(operators={})
  token = @buffer.token

  if STRATEGIES.has_key? token
    STRATEGIES[token].call(self, token)
  elsif token.is_a? PDF::Reader::Reference
    token
  elsif operators.has_key? token
    Token.new(token)
  elsif token.frozen?
    token
  elsif token =~ /\d*\.\d/
    token.to_f
  else
    token.to_i
  end
end