Class: Docsplit::InfoExtractor

Inherits:
Object
  • Object
show all
Defined in:
lib/docsplit/info_extractor.rb

Overview

Delegates to pdfinfo in order to extract information about a PDF file.

Constant Summary collapse

MATCHERS =

Regex matchers for different bits of information.

{
  :author   => /^Author:\s+([^\n]+)/,
  :date     => /^CreationDate:\s+([^\n]+)/,
  :creator  => /^Creator:\s+([^\n]+)/,
  :keywords => /^Keywords:\s+([^\n]+)/,
  :producer => /^Producer:\s+([^\n]+)/,
  :subject  => /^Subject:\s+([^\n]+)/,
  :title    => /^Title:\s+([^\n]+)/,
  :length   => /^Pages:\s+([^\n]+)/,
}

Instance Method Summary collapse

Instance Method Details

#extract(key, pdfs, opts) ⇒ Object

Pull out a single datum from a pdf.

Raises:



19
20
21
22
23
24
25
26
27
28
# File 'lib/docsplit/info_extractor.rb', line 19

def extract(key, pdfs, opts)
  pdf = [pdfs].flatten.first
  cmd = "pdfinfo \"#{pdf}\" 2>&1"
  result = `#{cmd}`.chomp
  raise ExtractionFailed, result if $? != 0
  match = result.match(MATCHERS[key])
  answer = match && match[1]
  answer = answer.to_i if answer && key == :length
  answer
end