Class: FormatParser::PDFParser
- Inherits:
-
Object
- Object
- FormatParser::PDFParser
- Includes:
- IOUtils
- Defined in:
- lib/parsers/pdf_parser.rb
Constant Summary collapse
- PDF_MARKER =
First 9 bytes of a PDF should be in this format, according to:
https://stackoverflow.com/questions/3108201/detect-if-pdf-file-is-correct-header-pdf
There are however exceptions, which are left out for now.
/%PDF-1\.[0-8]{1}/
- COUNT_MARKERS =
Page counts have different markers depending on the PDF type. There is not a single common way of solving this. The only way of solving this correctly is by adding different types of PDF’s in the specs.
['Count ']
- EOF_MARKER =
'%EOF'
Instance Method Summary collapse
Methods included from IOUtils
Instance Method Details
#call(io) ⇒ Object
20 21 22 23 24 25 26 27 28 29 30 31 |
# File 'lib/parsers/pdf_parser.rb', line 20 def call(io) io = FormatParser::IOConstraint.new(io) return unless safe_read(io, 9) =~ PDF_MARKER attributes = scan_for_attributes(io) FormatParser::Document.new( format: :pdf, page_count: attributes[:page_count] ) end |