Module: Docsplit::TransparentPDFs

Included in:
Docsplit
Defined in:
lib/docsplit/transparent_pdfs.rb

Overview

Include a method to transparently convert non-PDF arguments to temporary PDFs. Allows us to pretend to natively support docs, rtf, ppt, and so on.

Instance Method Summary collapse

Instance Method Details

#ensure_pdfs(docs) ⇒ Object

Temporarily convert any non-PDF documents to PDFs before running them through further extraction.



9
10
11
12
13
14
15
16
17
18
19
# File 'lib/docsplit/transparent_pdfs.rb', line 9

def ensure_pdfs(docs)
  [docs].flatten.map do |doc|
    if is_pdf?(doc)
      doc
    else
      tempdir = File.join(Dir.tmpdir, 'docsplit')
      extract_pdf([doc], {:output => tempdir})
      File.join(tempdir, File.basename(doc, File.extname(doc)) + '.pdf')
    end
  end
end

#is_pdf?(doc) ⇒ Boolean

Returns:

  • (Boolean)


21
22
23
# File 'lib/docsplit/transparent_pdfs.rb', line 21

def is_pdf?(doc)
  File.extname(doc).downcase == '.pdf' || File.open(doc, 'rb', &:readline) =~ /\A\%PDF-\d+(\.\d+)?/
end