SpiderMech
SpiderMech crawls a given domain, and reports on the pages linked to from given urls, and the assets that said page depends on.
Installation
Add this line to your application's Gemfile:
gem 'spidermech'
And then execute:
$ bundle
Or install it yourself as:
$ gem install spidermech
Gem Usage
require 'spidermech'
spider = SpiderMech.new 'http://google.com'
spider.run # returns the sitemap hash
spider.save_json # saves the sitemap hash as google.com.json
Command Line Usage
The gem provides a command line tool. You can invoke it via
bundle exec spidermech http://google.com
It will crawl the page and give you the appropriate output.
Sample Output
[{:url=>"http://localhost:8321",
:assets=>
{:scripts=>["https://ajax.googleapis.com/ajax/libs/jquery/1.11.0/jquery.min.js", "http://getbootstrap.com/dist/js/bootstrap.min.js"],
:images=>[],
:css=>
["http://getbootstrap.com/dist/css/bootstrap.min.css", "http://getbootstrap.com/examples/starter-template/starter-template.css"]},
:links
=>["/", "/about.html", "/contact.html"]},
]
Contributing
- Fork it ( http://github.com/
/crawler/fork ) - Create your feature branch (
git checkout -b my-new-feature
) - Commit your changes (
git commit -am 'Add some feature'
) - Push to the branch (
git push origin my-new-feature
) - Create new Pull Request