ReadabilityJS for Ruby
Clean up web pages and extract the main content, powered by Mozilla Readability.
This is a Ruby wrapper gem for readability, by running a node process with nodo.
Contents
Installation
Prerequisites
NodeJS >= 22.x is installed and available via commandline (in PATH).
Gem
Add this line to your application's Gemfile:
gem 'readability_js'
And then execute:
$ bundle install
Or install it yourself as:
$ gem install readability_js
Usage examples
Original parse
Using this method, only the mozilla readability parse method is called.
require 'readability_js'
html = File.read("my_article.html")
result = ReadabilityJs.parse(html)
p result
Extended parse
Using this method, the extended parse method is called, which provides more cleaned up output, and includes a beautified markdown version of the content.
require 'readability_js'
html = File.read("my_article.html")
result = ReadabilityJs.parse_extended(html)
p result
Query parameters
You can pass all parameters supported by readability, checkout the rubydoc for more details.
Here an example with all parameters, the camelCase parameters are converted to snake_case in ruby:
require 'readability_js'
data = ReadabilityJs.parse(
# TODO: add parameters here
)
# => Hash
Query response
The response object is of type Hash.
It contains the data returned by readability, with hash keys transformed in snake_case.
{
"title" => "Article Title",
"content" => "<div>...</div>",
"text_content" => "Plain text content",
"markdown_content" => "## Markdown content", # only for extended parse
"length" => 1234,
"excerpt" => "This is an excerpt of the article...",
"byline" => "Author Name",
"dir" => "ltr",
"site_name" => "example.com",
"lang" => "en",
"published_time" => "2024-01-01T12:00:00Z",
"image_url" => "https://example.com/image.jpg" # only for extended parse
}
Documentation
Check out the doc at RubyDoc:
https://www.rubydoc.info/gems/readability_js
As this library is only a wrapper, checkout the original readability documentation:
https://github.com/mozilla/readability
Contributing
Bug reports and pull requests are welcome on GitHub at https://github.com/magynhard/ruby-readability_js.
This project is intended to be a safe, welcoming space for collaboration, and contributors are expected to adhere to the Contributor Covenant code of conduct.