Cut

A DSL for Scraping Websites

Installation

Add this line to your application's Gemfile:

gem 'cut'

And then execute:

$ bundle

Or install it yourself as:

$ gem install cut

Usage

Search Google:

class SearchResult

  include Cut

  url "http://google.com/search?q={{keywords}}"

  selector "li.g"

  map :title, String, to: "h3.r"
  map :url,   String, to: "div.s cite", operation: lambda {|str| str.upcase }

end

Return Results:

SearchResult.all(keywords: "war and peace")
#=> [#<SearchResult:0x007f94bbfaae90 @title="War and Peace - Wikipedia, the free encyclopedia", @url="HTTPS://EN.WIKIPEDIA.ORG/WIKI/WAR_AND_PEACE">, #<SearchResult:0x007f94beed97c0 @title="War and Peace (Vintage Classics): Leo Tolstoy, Richard Pevear ...", @url="WWW.AMAZON.COM/WAR-PEACE-VINTAGE-CLASSICS.../DP/1400079985">, #<SearchResult:0x007f94be95ee80 @title="War and Peace (1956) - IMDb", @url="WWW.IMDB.COM/TITLE/TT0049934/">, #<SearchResult:0x007f94be9cb198 @title="SparkNotes: War and Peace", @url="WWW.SPARKNOTES.COM/LIT/WARANDPEACE/">, #<SearchResult:0x007f94be9c7ea8 @title="War and Peace by graf Leo Tolstoy - Free Ebook - Project Gutenberg", @url="WWW.GUTENBERG.ORG/EBOOKS/2600">, #<SearchResult:0x007f94bc83f218 @title="War and Peace by Leo Tolstoy - Reviews, Discussion, Bookclubs, Lists", @url="WWW.GOODREADS.COM/BOOK/SHOW/656.WAR_AND_PEACE">, #<SearchResult:0x007f94bba7ee80 @title="War and Peace - The Literature Network", @url="WWW.ONLINE-LITERATURE.COM/TOLSTOY/WAR_AND_PEACE/">, #<SearchResult:0x007f94bba7b820 @title="War and Peace - graf Leo Tolstoy - Google Books", @url="BOOKS.GOOGLE.COM/BOOKS/ABOUT/WAR_AND_PEACE.HTML?ID=2GOK4HJO2VKC">, #<SearchResult:0x007f94bbed4ac0 @title="Images for war and peace", @url="">, #<SearchResult:0x007f94bdda0eb8 @title="War and Peace - Shmoop", @url="WWW.SHMOOP.COM/WAR-AND-PEACE/">, #<SearchResult:0x007f94bdd695d0 @title="War and Peace - Planet PDF", @url="WWW.PLANETPDF.COM/PLANETPDF/PDFS/FREE_EBOOKS/WAR_AND_PEACE_NT.PDF">, #<SearchResult:0x007f94bdde53d8 @title="News for war and peace", @url="">]

SearchResult.first(keywords: "war and peace")
#=> #<SearchResult:0x007f94bdfbeb78 @title="War and Peace - Wikipedia, the free encyclopedia", @url="HTTPS://EN.WIKIPEDIA.ORG/WIKI/WAR_AND_PEACE">

Contributing

  1. Fork it
  2. Create your feature branch (git checkout -b my-new-feature)
  3. Commit your changes (git commit -am 'Add some feature')
  4. Push to the branch (git push origin my-new-feature)
  5. Create new Pull Request