ReadabilityJS for Ruby

Gem Gem License: MIT

Clean up web pages and extract the main content, powered by Mozilla Readability.

This is a Ruby wrapper gem for readability, by running a node process with nodo.

Contents

Installation

Prerequisites

NodeJS >= 22.x is installed and available via commandline (in PATH).

Gem

Add this line to your application's Gemfile:

gem 'readability_js'

And then execute:

$ bundle install

Or install it yourself as:

$ gem install readability_js

Usage examples

Original parse

Using this method, only the mozilla readability parse method is called.

    require 'readability_js'
    html = File.read("my_article.html")
    result = ReadabilityJs.parse(html)
    p result

Extended parse

Using this method, the extended parse method is called, which provides more cleaned up output, and includes a beautified markdown version of the content.

    require 'readability_js'
    html = File.read("my_article.html")
    result = ReadabilityJs.parse_extended(html)
    p result

Query parameters

You can pass all parameters supported by readability, checkout the rubydoc for more details.

Here an example with all parameters, the camelCase parameters are converted to snake_case in ruby:

    require 'readability_js'
data = ReadabilityJs.parse(
  # TODO: add parameters here
)
# => Hash

Query response

The response object is of type Hash. It contains the data returned by readability, with hash keys transformed in snake_case.

{
  "title" => "Article Title",
  "content" => "<div>...</div>",
  "text_content" => "Plain text content",
  "markdown_content" => "## Markdown content", # only for extended parse
  "length" => 1234,
  "excerpt" => "This is an excerpt of the article...",
  "byline" => "Author Name",
  "dir" => "ltr",
  "site_name" => "example.com",
  "lang" => "en",
  "published_time" => "2024-01-01T12:00:00Z",
  "image_url" => "https://example.com/image.jpg" # only for extended parse
}    

Documentation

Check out the doc at RubyDoc:
https://www.rubydoc.info/gems/readability_js

As this library is only a wrapper, checkout the original readability documentation:
https://github.com/mozilla/readability

Contributing

Bug reports and pull requests are welcome on GitHub at https://github.com/magynhard/ruby-readability_js.

This project is intended to be a safe, welcoming space for collaboration, and contributors are expected to adhere to the Contributor Covenant code of conduct.