Pantopoda
Pantopoda is a ruby spidering library that was built out of sheer frustration at the lack of good, modern web crawling tools that Python enjoys. Pantopoda uses bloom filters to store the list of visited urls for efficient querying up to hundreds of thousands of urls, and the requests are handled by Typhoeus to allow for multi-threaded crawling.
Pantopoda will crawl every single page it can find on a particular domain.
Installation
Add this line to your application's Gemfile:
gem 'pantopoda'
And then execute:
$ bundle
Or install it yourself as:
$ gem install pantopoda
Usage
TODO: Write usage instructions here
Contributing
- Fork it ( https://github.com/[my-github-username]/pantopoda/fork )
- Create your feature branch (
git checkout -b my-new-feature
) - Commit your changes (
git commit -am 'Add some feature'
) - Push to the branch (
git push origin my-new-feature
) - Create a new Pull Request