Class: FormatParser::PDFParser

Inherits:
Object
  • Object
show all
Includes:
IOUtils
Defined in:
lib/parsers/pdf_parser.rb

Constant Summary collapse

PDF_MARKER =

First 9 bytes of a PDF should be in this format, according to:

https://stackoverflow.com/questions/3108201/detect-if-pdf-file-is-correct-header-pdf

There are however exceptions, which are left out for now.

/%PDF-1\.[0-8]{1}/
COUNT_MARKERS =

Page counts have different markers depending on the PDF type. There is not a single common way of solving this. The only way of solving this correctly is by adding different types of PDF’s in the specs.

['Count ']
EOF_MARKER =
'%EOF'

Instance Method Summary collapse

Methods included from IOUtils

#safe_read, #safe_skip

Instance Method Details

#call(io) ⇒ Object



20
21
22
23
24
25
26
27
28
29
30
31
# File 'lib/parsers/pdf_parser.rb', line 20

def call(io)
  io = FormatParser::IOConstraint.new(io)

  return unless safe_read(io, 9) =~ PDF_MARKER

  attributes = scan_for_attributes(io)

  FormatParser::Document.new(
    format: :pdf,
    page_count: attributes[:page_count]
  )
end