EPUB Parser

EPUB Parser gem parses EPUB 3 book loosely.

Installation

gem install epub-parser

Usage

As command-line tools

epubinfo

epubinfo tool extracts and shows the metadata of specified EPUB book.

See Epubinfo.

epub-open

epub-open tool provides interactive shell(IRB) which helps you research about EPUB book.

See EpubOpen.

As a library

Use EPUB::Parser.parse at first:

require 'epub/parser'

book = EPUB::Parser.parse('/path/to/book.epub')

This book object can yield page by spine's order(spine defines the order to read that the author determines):

book.each_page_on_spine do |page|
  # do something...
end

page above is an EPUB::Publication::Package::Manifest::Item object and you can call #href to see where is the page file:

book.each_page_on_spine do |page|
  file = page.href # => path/to/page/in/zip/archive
  html = Zip::Archive.open('/path/to/book.epub') {|zip|
    zip.fopen(file.to_s) {|file| file.read}
  }
end

And Item provides syntax suger #read for above:

html = page.read
doc = Nokogiri.HTML(html)
# do something with Nokogiri as always

For several utilities of Item, see Item page.

By the way, although book above is a EPUB::Book object, all features are provided by EPUB::Book::Features module. Therefore YourBook class can include the features of EPUB::Book::Features:

require 'epub'

class YourBook < ActiveRecord::Base
    include EPUB::Book::Features
end

book = EPUB::Parser.parse(
  'uploaded-book.epub',
  :class => YourBook # *************** pass YourBook class
)
book.instance_of? YourBook # => true
book.required = 'value for required field'
book.save!
book.each_page_on_spine do |epage|
  page = YouBookPage.create(
    :some_attr    => 'some attr',
    :content      => epage.read,
    :another_attr => 'another attr'
  )
  book.pages << page
end

You are also able to find YourBook object for the first:

book = YourBook.find params[:id]
ret = EPUB::Parser.parse(
  'uploaded-book.epub',
  :book => book # ******************* pass your book instance
) # => book
ret == book # => true; this API is not good I feel... Welcome suggestion!
# do something with your book

Switching ZIP library

EPUB Parser uses Archive::Zip, a pure Ruby ZIP library, by default. You can use Zip/Ruby, a Ruby bindings for libzip if you have already installed Zip/Ruby gem by RubyGems or Bundler.

Globally:

EPUB::OCF::PhysicalContainer.adapter = :Zipruby
book = EPUB::Parser.parse("path/to/book.epub")

For each EPUB book:

book = EPUB::Parser.parse("path/to/book.epub", container_adapter: :Zipruby)

Documentation

More documentations are avaiable in:

If you installed EPUB Parser via gem command, you can also generate documentaiton by your own(rubygems-yardoc gem is needed):

$ gem install epub-parser
$ gem yardoc epub-parser
...
Files:          33
Modules:        20 (   20 undocumented)
Classes:        45 (   44 undocumented)
Constants:      31 (   31 undocumented)
Methods:       292 (   88 undocumented)
52.84% documented
YARD documentation is generated to:
/path/to/gempath/ruby/2.2.0/doc/epub-parser-0.2.0/yardoc

It will show you path to generated documentation(/path/to/gempath/ruby/2.2.0/doc/epub-parser-0.2.0/yardoc here) at the end.

Or, generating yardoc command is possible, too:

$ git clone https://gitlab.com/KitaitiMakoto/epub-parser.git
$ cd epub-parser
$ bundle install --path=deps
$ bundle exec rake doc:yard
...
Files:          33
Modules:        20 (   20 undocumented)
Classes:        45 (   44 undocumented)
Constants:      31 (   31 undocumented)
Methods:       292 (   88 undocumented)
52.84% documented

Then documentation will be available in doc directory.

Requirements

  • Ruby 2.2.0 or later
  • patch command to install Nokogiri
  • C compiler to compile Zip/Ruby and Nokogiri

History

See CHANGELOG.

Note

This library is still in work. Only a few features are implemented and APIs might be changed in the future. Note that.

Currently implemented:

License

This library is distributed under the term of the MIT Licence. See MIT-LICENSE file for more info.