Agio

Description

Agio is a library and a tool for converting HTML to Markdown.

About the Name

The name was chosen because agio is “a premium on money in exchange”, sort of the opposite of a markdown. It comes from the Italian aggio (premium), not from the Italian agio (ease), although the hope is that there is an ease in use of this library.

Why Agio?

  1. Agio is well tested.

  2. Agio is pure Ruby.

  3. Agio is MIT licensed.

Agio is Well Tested

With this release, there are 274 unit tests (RSpec examples) covering almost everything. The only things not covered are Agio itself (which mostly performs coordination duties) and Agio::Bourse (the parsed HTML-to-Markdown translator). These tests cover 85%–95% of the code known to be tested, and the modules currently missing test will have those tests completed prior to the release of Agio 1.0.

In addition to the unit tests, more than a hundred manual tests have been run to verify the quality of output for basic HTML. These manual tests have taken one of two forms:

  1. Markdown input converted to HTML with rdiscount, and then converted back to Markdown with Agio. In all cases, the resulting Markdown is either identical to the original or the differences can be attributed to style (how Agio writes emphasized text versus the hand-written original, or how Agio represents links by default). This tests Agio’s round-trip capability.

  2. HTML input converted to Markdown with either Pandoc or html2txt.py, converted back to HTML with rdiscount, and then converted once again to Markdown with Agio. This tests Agio’s ability to output data that is syntactically similar to those of better-known and presumably better-tested tools.

Agio will likely have bugs, especially before version 1.0, and not all features are yet implemented or exposed to the user. Syntactic support is also incomplete, as the goal is to support many of the syntax extensions found in Markdown Extra or other popular modules, such as Github-flavoured Markdown.

Where

Synopsis

Install Agio with:

gem install agio

Run Agio against HTML with:

agio input.html > output.markdown

History

Why I Wrote Agio

Agio is the result of some yak-shaving as I was looking to convert my blog content from WordPress to Jekyll. The Jekyll wiki points to Thomas Frőssman’s Exitwp Python script as a reliable conversion mechanism, but I found that it couldn’t handle the data in my WordPress export file. So, I ported Exitwp from Python to Ruby as Poole.

Like Exitwp, Poole depends on Pandoc. While it’s an amazing tool, it took the better part of 45 minutes to compile the Haskell Platform with Homebrew and Pandoc with Cabal. Looking around the Ruby community, there wasn’t a single Ruby-based HTML-to-Markdown converter that I felt I could trust to get everything right that was also licensed to my liking (I prefer the MIT license). While Kramdown is impressive, it’s GPL-licensed. I didn’t want Poole (which is MIT-licensed) to depend on anything that provided any less freedom for any purpose.

Armed with this plan, I started the process of deeply understanding how Aaron Swartz’s html2txt.py script works. This included an early version of Agio that was a more-or-less straightforward port, but produced output that was worse because of differences between Python’s textwrap module and Ruby’s Text::Format that could not be cleanly resolved by tweaking Aaron’s algorithm.

:include: License.rdoc