Class: IMW::Formats::Pdf::Snippetizer

Inherits:
Object
  • Object
show all
Defined in:
lib/imw/formats/pdf.rb

Overview

A receiver class used by PDF::Reader which agglomerates text up to 1024 bytes and then bails.

Constant Summary collapse

SnippetEndError =

A custom error class that can be thrown while receiving text from PDF::Reader to cut-short walking large PDF documents.

Class.new(IMW::Error)

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initializeSnippetizer

Returns a new instance of Snippetizer.



36
37
38
# File 'lib/imw/formats/pdf.rb', line 36

def initialize
  @snippet = ''
end

Instance Attribute Details

#snippetObject

The snippet being built by this snippetizer.



34
35
36
# File 'lib/imw/formats/pdf.rb', line 34

def snippet
  @snippet
end

Instance Method Details

#show_text(*params) ⇒ Object Also known as: show_text_with_positioning, move_to_next_line_and_show_text, set_spacing_next_line_show_text

Agglomerates text from PDF::Reader up to a fixed size of 1024 bytes.

Will convert a single-space line from PDF::Reader as a newline character.

FIXME How does the receiver ask PDF::Reader to abort walking the document now that enough text has been returned? Till a more graceful way is found this method simply raises an error, creating a GOTO…



50
51
52
53
54
55
56
57
58
59
60
61
62
# File 'lib/imw/formats/pdf.rb', line 50

def show_text *params
  params.each do |string|
    if @snippet.size < 1024
      if string == ' '
        @snippet += "\n"
      else
        @snippet += string[0..1024]
      end
    else
      raise SnippetEndError.new
    end
  end
end