Class: PDFToHTMLR::PdfFile

Inherits:

Object

Object
PDFToHTMLR::PdfFile

show all

Defined in:: lib/pdftohtmlr.rb

Overview

Provides facilities for converting PDFs to HTML from Ruby code.

Direct Known Subclasses

PdfFilePath, PdfFileUrl

Instance Attribute Summary collapse

#format ⇒ Object readonly

Returns the value of attribute format.
#owner_pwd ⇒ Object readonly

Returns the value of attribute owner_pwd.
#path ⇒ Object readonly

Returns the value of attribute path.
#target ⇒ Object readonly

Returns the value of attribute target.
#user_pwd ⇒ Object readonly

Returns the value of attribute user_pwd.

Instance Method Summary collapse

#convert ⇒ Object

Convert the PDF document to HTML.
#convert_to_document ⇒ Object

Convert the PDF document to HTML.
#convert_to_xml ⇒ Object
#convert_to_xml_document ⇒ Object
#initialize(input_path, target_path = nil, user_pwd = nil, owner_pwd = nil) ⇒ PdfFile constructor

A new instance of PdfFile.

Constructor Details

#initialize(input_path, target_path = nil, user_pwd = nil, owner_pwd = nil) ⇒ `PdfFile`

Returns a new instance of PdfFile.

# File 'lib/pdftohtmlr.rb', line 33

def initialize(input_path, target_path=nil, user_pwd=nil, owner_pwd=nil)
  @path = input_path
  @target = target_path
  @user_pwd = user_pwd
  @owner_pwd = owner_pwd      
end

Instance Attribute Details

#format ⇒ `Object` (readonly)

Returns the value of attribute format.



31
32
33

# File 'lib/pdftohtmlr.rb', line 31

def format
  @format
end

#owner_pwd ⇒ `Object` (readonly)

Returns the value of attribute owner_pwd.



30
31
32

# File 'lib/pdftohtmlr.rb', line 30

def owner_pwd
  @owner_pwd
end

#path ⇒ `Object` (readonly)

Returns the value of attribute path.



27
28
29

# File 'lib/pdftohtmlr.rb', line 27

def path
  @path
end

#target ⇒ `Object` (readonly)

Returns the value of attribute target.



28
29
30

# File 'lib/pdftohtmlr.rb', line 28

def target
  @target
end

#user_pwd ⇒ `Object` (readonly)

Returns the value of attribute user_pwd.



29
30
31

# File 'lib/pdftohtmlr.rb', line 29

def user_pwd
  @user_pwd
end

Instance Method Details

#convert ⇒ `Object`

Convert the PDF document to HTML. Returns a string

# File 'lib/pdftohtmlr.rb', line 41

def convert()
  errors = ""
  output = ""
  
  if @user_pwd 
    cmd = "pdftohtml -stdout #{@format} -upw #{@user_pwd}" + ' "' + @path + '"'    
  elsif @owner_pwd 
    cmd = "pdftohtml -stdout #{@format} -opw #{@owner_pwd}" + ' "' + @path + '"'
  else
    cmd = "pdftohtml -stdout #{@format}" + ' "' + @path + '"'
  end
  
  output = `#{cmd} 2>&1`

  if (output.include?("Error: May not be a PDF file"))
    raise PDFToHTMLRError, "Error: May not be a PDF file (continuing anyway)"
  elsif (output.include?("Error:"))
    raise PDFToHTMLRError, output.split("\n").first.to_s.chomp
  else
    return output
  end
end

#convert_to_document ⇒ `Object`

Convert the PDF document to HTML. Returns a Nokogiri::HTML:Document



65
66
67

# File 'lib/pdftohtmlr.rb', line 65

def convert_to_document() 
  Nokogiri::HTML.parse(convert())
end

#convert_to_xml ⇒ `Object`

# File 'lib/pdftohtmlr.rb', line 69

def convert_to_xml()
  @format = "-xml"
  convert()
end

#convert_to_xml_document ⇒ `Object`

# File 'lib/pdftohtmlr.rb', line 74

def convert_to_xml_document()
  @format = "-xml"
  Nokogiri::XML.parse(convert())
end

Class: PDFToHTMLR::PdfFile

Overview

Direct Known Subclasses

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(input_path, target_path = nil, user_pwd = nil, owner_pwd = nil) ⇒ PdfFile

Instance Attribute Details

#format ⇒ Object (readonly)

#owner_pwd ⇒ Object (readonly)

#path ⇒ Object (readonly)

#target ⇒ Object (readonly)

#user_pwd ⇒ Object (readonly)