Unpacked Archive
From version 0.2.0, EPUB Parser can parse EPUB books from unpacked archive, or file system directory.
Let's parse pretty comic Page Blanche:
% tree page-blanche
page-blanche
├── EPUB
│ ├── Content
│ │ ├── PageBlanche_Page_000.xhtml
│ │ ├── PageBlanche_Page_001.xhtml
│ │ ├── PageBlanche_Page_002.xhtml
│ │ ├── PageBlanche_Page_003.xhtml
│ │ ├── PageBlanche_Page_004.xhtml
│ │ ├── PageBlanche_Page_005.xhtml
│ │ ├── PageBlanche_Page_006.xhtml
│ │ ├── PageBlanche_Page_007.xhtml
│ │ ├── PageBlanche_Page_008.xhtml
│ │ └── cover.xhtml
│ ├── Image
│ │ ├── PageBlanche_Page_001.jpg
│ │ ├── PageBlanche_Page_002.jpg
│ │ ├── PageBlanche_Page_003.jpg
│ │ ├── PageBlanche_Page_004.jpg
│ │ ├── PageBlanche_Page_005.jpg
│ │ ├── PageBlanche_Page_006.jpg
│ │ ├── PageBlanche_Page_007.jpg
│ │ ├── PageBlanche_Page_008.jpg
│ │ └── cover.jpg
│ ├── Navigation
│ │ ├── nav.xhtml
│ │ └── toc.ncx
│ ├── Style
│ │ └── style.css
│ └── package.opf
├── META-INF
│ └── container.xml
└── mimetype
To load EPUB books from directory, you need specify file adapter via EPUB::OCF::PhysicalContainer at first:
require 'epub/parser'
EPUB::OCF::PhysicalContainer.adapter = :UnpackedDirectory
And then, directory path as EPUB path:
epub = EPUB::Parser.parse('./page-blanche')
Now you can handle the EPUB book as always.
epub.title # => "Page Blache"
epub.each_page_on_spine.to_a.length # => 10
puts epub.nav.content_document.contents.map {|content| "#{File.basename(content.href.to_s)} ... #{content.text}"}
# PageBlanche_Page_002.xhtml ... Dédicace
# PageBlanche_Page_005.xhtml ... Commencer la lecture
# => nil
If set EPUB::OCF::PhysicalContainer.adapter, it is used every time EPUB Parser parses books even when it's packaged EPUB file. Instead of setting adapter globally, you can also specify adapter for parsing individually by passing keyword argument container_adapter
to .parse
method:
# From packaged file
File.ftype './page-blanche.epub' # => "file"
archived_book = EPUB::Parser.parse('./page-blanche.epub') # => EPUB::Book
# From directory
File.ftype './page-blanche' # => "directory"
unpacked_book = EPUB::Parser.parse('./page-blanche', container_adapter: :UnpackedDirectory) # => EPUB::Book
Command-line tools
Command-line tools epubinfo
and epub-open
may also handle with directory as EPUB books.
Executing epubinfo
:
$ epubinfo page-blanche
Title: Page Blanche
Identifiers: code.google.com.epub-samples.page-blanche
Titles: Page Blanche
Languages: fr
Contributors: Vincent Gros
Coverages:
Creators: Boulet, Bagieu Pénélope
Dates: 2012-01-18
Descriptions:
Formats:
Publishers: éditions Delcourt
Relations:
Rights: This work is shared with the public using the Attribution-ShareAlike 3.0 Unported (CC BY-SA 3.0) license.
Sources:
Subjects:
Types:
Unique identifier: code.google.com.epub-samples.page-blanche
Epub version: 3.0
Executing epub-open
:
$ epub-open page-blanche
Enter "exit" to exit IRB
irb: warn: can't alias bindings from irb_workspaces.
irb(main):001:0> title
=> "Page Blanche"
irb(main):002:0> exit
Note
Actually loading EPUB books from unpacked directory is not recommended. The reason why is it's too complex to handle with files properly because of character encoding of file names such as Unicode normalization matters like UTF-8 NFD, NFC, NFKD, NFKC and OS X-specific custom NFD, IRI normalization like percent-encoding, case sensitivity or so on. And, you know, this is not standardized way to load EPUB books. So, at least in the near future, there's not plan to support various environment.
Of course, always pathces are welcome.