Class: Docsplit::InfoExtractor
- Inherits:
-
Object
- Object
- Docsplit::InfoExtractor
- Defined in:
- lib/docsplit/info_extractor.rb
Overview
Delegates to pdfinfo in order to extract information about a PDF file.
Constant Summary collapse
- MATCHERS =
Regex matchers for different bits of information.
{ :author => /^Author:\s+([^\n]+)/, :date => /^CreationDate:\s+([^\n]+)/, :creator => /^Creator:\s+([^\n]+)/, :keywords => /^Keywords:\s+([^\n]+)/, :producer => /^Producer:\s+([^\n]+)/, :subject => /^Subject:\s+([^\n]+)/, :title => /^Title:\s+([^\n]+)/, :length => /^Pages:\s+([^\n]+)/, }
Instance Method Summary collapse
-
#extract(key, pdfs, opts) ⇒ Object
Pull out a single datum from a pdf.
Instance Method Details
#extract(key, pdfs, opts) ⇒ Object
Pull out a single datum from a pdf.
19 20 21 22 23 24 25 26 27 28 |
# File 'lib/docsplit/info_extractor.rb', line 19 def extract(key, pdfs, opts) pdf = [pdfs].flatten.first cmd = "pdfinfo #{ESCAPE[pdf]} 2>&1" result = `#{cmd}`.chomp raise ExtractionFailed, result if $? != 0 match = result.match(MATCHERS[key]) answer = match && match[1] answer = answer.to_i if answer && key == :length answer end |