Earl <img src=“https://secure.travis-ci.org/evendis/earl.png” />

Earl wants to help you scrape all the relevant metadata for your favorite web pages so you can be as cool as Facebook when displaying user-submitted link content. Earl returns details like titles, descriptions, content type, associated feeds, and OEmbed definitions if available.

Earl is based on an original source project called earl by teejayvanslyke (but never released as a gem). The revamp was done by Paul Gallagher, and master source is currently available at github.com/evendis/earl.

The Earl gem is officially named earl. Big thanks go to jeremyruppel who contributed the ownership of the earl gem name. The original earl gem had a somewhat similar purpose - it is now defunct, but still available up to version 0.3.0 via rubgems. Any earl gem with version 1.0.0 or higher is the new gem release (and is in no way backwardly compatible with earlier versions).

The Earl Cookbook

How do instantiate Earl?

Pass any url-like string to Earl:

my_earl_instance = Earl.new('https://github.com/evendis/earl')
#
# or using the []= convenience method:
my_earl_instance = Earl['https://github.com/evendis/earl']

How do I inspect details of the page?

earl = Earl['https://github.com/evendis/earl']
earl.title
=>  "evendis/earl · GitHub"
earl.description
=> "earl - URL metadata API for scraping titles, descriptions, images, and videos from URL's."
earl.image
=> "https://a248.e.akamai.net/assets.github.com/images/modules/header/[email protected]?1340935010"

Earl will get oembed details for a link if they are available.

earl = Earl['http://www.youtube.com/watch?v=g3DCEcSlfhw']
earl.oembed
=> {"provider_url"=>"http://www.youtube.com/", "thumbnail_url"=>"http://i4.ytimg.com/vi/g3DCEcSlfhw/hqdefault.jpg", "title"=>"'Virtuosos of Guitar 2008' festival, Moscow. Marcin Dylla", "html"=>"<iframe width=\"420\" height=\"315\" src=\"http://www.youtube.com/embed/g3DCEcSlfhw?fs=1&feature=oembed\" frameborder=\"0\" allowfullscreen></iframe>", "author_name"=>"guitarmagnet", "height"=>315, "thumbnail_width"=>480, "width"=>420, "version"=>"1.0", "author_url"=>"http://www.youtube.com/user/guitarmagnet", "provider_name"=>"YouTube", "type"=>"video", "thumbnail_height"=>360}
# to get the embed code:
earl.oembed_html
=> "<iframe width=\"420\" height=\"315\" src=\"http://www.youtube.com/embed/g3DCEcSlfhw?fs=1&feature=oembed\" frameborder=\"0\" allowfullscreen></iframe>"

Supported oembed parameters may be provided with to ‘Earl.new` or to the `oembed` call:

earl = Earl.new('http://www.youtube.com/watch?v=g3DCEcSlfhw', {:oembed => {:maxwidth => "200", :maxheight => "320"}})
earl.oembed_html
=> <iframe width="200" height="150" src="http://www.youtube.com/embed/g3DCEcSlfhw?fs=1&feature=oembed" frameborder="0" allowfullscreen></iframe>
earl = Earl.new('http://www.youtube.com/watch?v=g3DCEcSlfhw')
earl.oembed({:maxwidth => "100", :maxheight => "120"})
earl.oembed_html
=> <iframe width="100" height="75" src="http://www.youtube.com/embed/g3DCEcSlfhw?fs=1&feature=oembed" frameborder="0" allowfullscreen></iframe>

How do I inspect what attributes are available for a page?

To see all of the attributes a URL provides, simply ask:

earl = Earl['https://github.com/evendis/earl']
earl.attributes
=> [:title, :image, :description, :rss_feed, :atom_feed, :content_type, :base_url, :charset, :content_encoding, :headers, :feed]

How can I extend Earl to scrape additional page details?

Need to scrape additional page details currently supported by Earl? Implement your own scraper:

class QotdScraper < Earl::Scraper
  match /^http\:\/\/www\.quotationspage\.com\/qotd\.html$/

  define_attribute :qotd do |doc|
    doc.at('dt.quote a').text
  end
end

The define_attribute method will supply you with a Nokogiri document which you can traverse to your heart’s content. Use the match method to limit the scope of URLs that your scraper will apply to.

Your new attribute is now available for use:

Earl['http://www.quotationspage.com/qotd.html'].qotd
=> "Love is a snowmobile racing across the tundra and then suddenly it flips over, pinning you underneath. At night, the ice weasels come."

How do I install it for normal use?

If using bundler, add gem ‘earl’ your application’s Gemfile and run ‘bundle`.

Or install it from the command-line:

$ gem install earl

How do I install it for gem development?

To work on enhancements of fix bugs in Earl, fork and clone the github repository. If you are using bundler (recommended), run bundle to install development dependencies:

$ gem install bundler
$ bundle

How do I run the tests?

Once development dependencies are installed, all unit tests are run with just:

$ rake
# or..
$ rake spec

Unit tests exclude a set of integration tests that actually hit the network (therefore not ‘nice’ to run all the time, and also subject to failures due to network availability or changes in the services accessed). To run integration tests:

$ rake spec:integration

To run all tests (unit and integration):

$ rake spec:all

How do I automatically run tests when I modify files?

Guard is installed as part of the development dependencies. Start a guard process in a terminal window:

$ bundle exec guard

It will run all the tests to start with by default. Then whenever you change a file, the associated tests will execute in this terminal window.

Contributing to Earl

  • Check out the latest master to make sure the feature hasn’t been implemented or the bug hasn’t been fixed yet

  • Check out the issue tracker to make sure someone already hasn’t requested it and/or contributed it

  • Fork the project

  • Start a feature/bugfix branch

  • Commit and push until you are happy with your contribution

  • Make sure to add tests for it. This is important so I don’t break it in a future version unintentionally.

  • Please try not to mess with the Rakefile, version, or history. If you want to have your own version, or is otherwise necessary, that is fine, but please isolate to its own commit so I can cherry-pick around it.

See LICENSE for details.