EPUB Parser

Build Status Dependency Status


gem install epub-parser


As a library

require 'epub/parser'

book = EPUB::Parser.parse('book.epub')
book..titles # => Array of EPUB::Publication::Package::Metadata::Title. Main title, subtitle, etc...
book..title # => Title string including all titles
book..creators # => Creators(authors)
book.each_page_on_spine do |page|
  page.media_type # => "application/xhtml+xml"
  page.entry_name # => "OPS/nav.xhtml" entry name in EPUB package(zip archive)
  page.read # => raw content document
  page.content_document.nokogiri # => Nokogiri::XML::Document. The same to Nokogiri.XML(page.read)
  # do something more
  #    :

See document's Home or API Documentation for more info.

epubinfo command-line tool

epubinfo tool extracts and shows the metadata of specified EPUB book.

$ epubinfo ~/Documebts/Books/build_awesome_command_line_applications_in_ruby.epub
Title:              Build Awesome Command-Line Applications in Ruby (for KITAITI MAKOTO)
Identifiers:        978-1-934356-91-3
Titles:             Build Awesome Command-Line Applications in Ruby (for KITAITI MAKOTO)
Languages:          en
Creators:           David Bryant Copeland
Publishers:         The Pragmatic Bookshelf, LLC (338304)
Rights:             Copyright © 2012 Pragmatic Programmers, LLC
Subjects:           Pragmatic Bookshelf
Unique identifier:  978-1-934356-91-3
Epub version:       2.0

See Epubinfo for more info.

epub-open command-line tool

epub-open tool provides interactive shell(IRB) which helps you research about EPUB book.

epub-open path/to/book.epub

IRB starts. self becomes the EPUB book and can access to methods of EPUB.

=> "Title of the book"
=> [Author 1, Author2, ...]
=> #<Set: {"nav"}> # You know that first resource of this book is nav document
nav = resources.first
=> ...
=> #<Addressable::URI:0x15ce350 URI:nav.xhtml>
=> "application/xhtml+xml"
puts nav.read
<?xml version="1.0"?>
<html xmlns="http://www.w3.org/1999/xhtml" xmlns:epub="http://www.idpf.org/2007/ops">
=> nil
exit # Enter "exit" when exit the session

See EpubOpen for more info.


Documentation is available in homepage.

If you installed EPUB Parser by gem command, you can also generate documentaiton by your own(rubygems-yardoc gem is needed):

$ gem install epub-parser
$ gem yardoc epub-parser
Files:          33
Modules:        20 (   20 undocumented)
Classes:        45 (   44 undocumented)
Constants:      31 (   31 undocumented)
Methods:       292 (   88 undocumented)
52.84% documented
YARD documentation is generated to:

It will show you path to generated documentation(/path/to/gempath/ruby/2.2.0/doc/epub-parser-0.2.0/yardoc here) at the end.

Or, generating by yardoc command is possible, too:

$ git clone https://github.com/KitaitiMakoto/epub-parser.git
$ cd epub-parser
$ bundle install --path=deps
$ bundle exec rake doc:yard
Files:          33
Modules:        20 (   20 undocumented)
Classes:        45 (   44 undocumented)
Constants:      31 (   31 undocumented)
Methods:       292 (   88 undocumented)
52.84% documented

Then documentation will be available in doc directory.


  • Ruby 2.0.0 or later
  • patch command to install Nokogiri
  • C compiler to compile Zip/Ruby and Nokogiri
  • gepub - a generic EPUB library for Ruby
  • epubinfo - Extracts metadata information from EPUB files. Supports EPUB2 and EPUB3 formats.
  • ReVIEW - ReVIEW is a easy-to-use digital publishing system for books and ebooks.
  • epzip - epzip is EPUB packing tool. It's just only doing 'zip.' :)
  • eeepub - EeePub is a Ruby ePub generator
  • epub-maker - This library supports making and editing EPUB books based on this EPUB Parser library

If you find other gems, please tell me or request a pull request.



  • Change the name of physical container adapter for file system: :File -> :UnpackedDirectory


  • [BUGFIX]Item#entry_name returns normalized IRI


  • Remove deprecated EPUB::Constants::MediaType::UnsupportedError. Use UnsupportedMediatType instead.
  • Make it possible to use archive-zip gem to extract contents from EPUB package
  • Add warning about default physical container adapter change
  • Make it possible to extract contents from the web via EPUB::OCF::PhysicalContainer::UnpackedURI See ExtractContentsFromWeb for details.


  • Make it possible to parse file system directory as an EPUB file. See UnpackedArchive for details.


  • Introduce Nokogumbo for XHTML Content Documents
  • Stop support for Ruby 1.9
  • Remove EPUB.included method. Now including EPUB module empowers nothing of EPUB features. Include EPUB::Book::Features instead.
  • Add EPUB::Searcher::XHTML::Seamless and make it default searcher
  • Add EPUB::Publication::Package::Manifest#each_nav

See CHANGELOG for older changelogs and details.


  • EPUB 3.0.1
  • Multiple rootfiles
  • Help features for epub-open tool
  • Vocabulary Association Mechanisms
  • Implementing navigation document and so on
  • Media Overlays
  • Content Document
  • Digital Signature
  • Using SAX on parsing
  • Abstraction of XML parser(making it possible to use REXML, standard bundled XML library of Ruby)
  • Handle with encodings other than UTF-8


  • Simple inspect for epub-open tool
  • Using zip library instead of unzip command, which has security issue
  • Modify methods around fallback to see bindings element in the package
  • Content Document(only for Navigation Documents)
  • Fixed Layout
  • Vocabulary Association Mechanisms(only for itemref)
  • Archive library abstraction
  • Extracting and organizing common behavior from some classes to modules


This library is distribuetd under the term of the MIT License. See MIT-LICENSE file for more info.