Home > AggregateContentsFromWeb
Aggregate Contents From the Web
From version 0.2.1, EPUB Parser can parse unpacked(unzipped) EPUB files on the web and aggregate contents in the books.
Let's get contents of pretty cmmic Page Blanche from IDPF's GitHub repository: https://github.com/IDPF/epub3-samples/tree/master/30/page-blanche
We can consider URI https://raw.githubusercontent.com/IDPF/epub3-samples/master/30/page-blanche/
as the root directory of the book because we can get EPUB Open Container Format's container.xml
file from https://raw.githubusercontent.com/IDPF/epub3-samples/master/30/page-blanche/META-INF/container.xml
.
Note: Don't forget slash at the end of URI
EPUB Parser can treat the URI as EPUB book file path and parse contents from it by using EPUB::OCF::PhysicalContainer::UnpackedURI:
require 'epub/parser'
uri = 'https://raw.githubusercontent.com/IDPF/epub3-samples/master/30/page-blanche/'
epub = EPUB::Parser.parse(uri, container_adapter: :UnpackedURI)
The trick is to set container adapter to :UnpackedURI. It makes it possible to parse EPUB book from the web. Now we can play with EPUB books as always!
As an example, I will show you a script to download all the files of specified EPUB book to local directory(source code is available in repository's aggregate-contents-from-web).
Execution:
$ ruby examples/aggregate-contents-from-web.rb https://raw.githubusercontent.com/IDPF/epub3-samples/master/30/page-blanche/
Started downloading EPUB contents...
from: https://raw.githubusercontent.com/IDPF/epub3-samples/master/30/page-blanche/
to: /tmp/epub-parser20150703-13148-ghdtfq
Making mimetype file...
Downloading META-INF/container.xml ...
Downloading EPUB/package.opf ...
Downloading EPUB/Style/style.css ...
Downloading EPUB/Navigation/nav.xhtml ...
Downloading EPUB/Navigation/toc.ncx ...
Downloading EPUB/Content/cover.xhtml ...
Downloading EPUB/Content/PageBlanche_Page_000.xhtml ...
Downloading EPUB/Content/PageBlanche_Page_001.xhtml ...
Downloading EPUB/Content/PageBlanche_Page_002.xhtml ...
Downloading EPUB/Content/PageBlanche_Page_003.xhtml ...
Downloading EPUB/Content/PageBlanche_Page_004.xhtml ...
Downloading EPUB/Content/PageBlanche_Page_005.xhtml ...
Downloading EPUB/Content/PageBlanche_Page_006.xhtml ...
Downloading EPUB/Content/PageBlanche_Page_007.xhtml ...
Downloading EPUB/Content/PageBlanche_Page_008.xhtml ...
Downloading EPUB/Image/cover.jpg ...
Downloading EPUB/Image/PageBlanche_Page_001.jpg ...
Downloading EPUB/Image/PageBlanche_Page_002.jpg ...
Downloading EPUB/Image/PageBlanche_Page_003.jpg ...
Downloading EPUB/Image/PageBlanche_Page_004.jpg ...
Downloading EPUB/Image/PageBlanche_Page_005.jpg ...
Downloading EPUB/Image/PageBlanche_Page_006.jpg ...
Downloading EPUB/Image/PageBlanche_Page_007.jpg ...
Downloading EPUB/Image/PageBlanche_Page_008.jpg ...
/tmp/epub-parser20150703-13148-ghdtfq
The last line of the output is path to directory which contents are downloaded to. We can repackage it as an EPUB file. Let's use epzip utility to do that easily:
$ epzip /tmp/epub-parser20150703-13148-ghdtfq ./page-blanche.epub
Command-line tools
Command-line tools epubinfo
and epub-open
may also handle with URI as EPUB books.