xp

Ruby gem that adds some methods to String class for intuitive HTML/XML scraping.

Installation

$ gem install xp

Usage

In command line usage, xp filters HTML/XML documents provided via STDIN:

$ curl -s 'https://news.ycombinator.com' | xp --text '//td[class="title"]/a'

OR

$ curl -s 'https://news.ycombinator.com' | xp --text 'td.title > a'

Require (require 'xp') the gem to use in Ruby scripts. Following one liner can download all Dribbble shots in its home page:

'https://dribbble.com/'.css('.dribbble-link img').xpath('//img/@src').map(&:text).map(&:download)

API

xp adds the following methods to the String class:

Method Return type Remarks
to_nokogiri Nokogiri::XML::Document Converts a url or a page source to Nokogiri object
css(selector) String Filters a url or html string based on the selector
xpath(selector) Strng Filters a url or html string based on the selector
download(location: 'downloads', name: nil) String Downloads the url in the string (can be customized via the optional parameters)
page_source(user_agent_alias: :mac_firefox, user_agent: nil) String Gets the page source of a url (can be customized via optional parameters)
url? Boolean Checks whether current string is a url