pdftohtmlr

Wrapper around the command line tool pdftohtml which converts PDF to HTML, go figure.

This gem was inspired by the MiniMagick gem – which does the same thing for ImageMagick (thanks Corey).

Requirements

Just pdftohtml and Ruby (1.8.6+ as far as I know).

On Mac:

brew install pdftohtml

On Ubuntu: It should be installed by default with the ‘poppler-utils’ package.

Install

http://gemcutter.org/gems/pdftohtmlr

gem install pdftohtmlr

Using

gist examples

require 'pdftohtmlr'
require 'nokogiri'
include PDFToHTMLR
file = PdfFilePath.new([Path to Source PDF])
string = file.convert
doc = file.convert_to_document()

See included test cases for more usage examples, including passwords and URL fetching.

license

MIT (See included MIT-LICENSE)