Scrape

A really simple web scraper.

site "https://github.com/explore" # The site to scrape. Will be used as the base address.

match /evilmarty/ do |doc| # A regexp/string/proc to match against the current url.

  doc.search('a[href]') # The nokogiri document of the contents of the current url.

end

site "http://www.tumblr.com" # Can define multiple sites

queue "http://www.tumblr.com/tagged" # Add specified urls to scrape

match "/tagged" do |doc|
  # Do what ever we want with the document.
end

Usage

After creating a Scrapefile simple run:

scrape -f [FILE]

If no scapefile is specified then Scrapefile is used by default.

Installation

Simply install the gem

gem install scrape

or you can download the source by cloning the repository

git clone https://github.com/evilmarty/scrape.git

Contribute

Please fork the repository and make a pull request on Github.

If you discover an issue please lodge it.

TODO

  • Fix bugs
  • Depth limiting
  • Better docs