R.Daneel
An EventMachine+Ruby library to fetch urls obeying robots.txt rules.
RDaneel is built it on top of @igrigorik’s em-http-request
Features
-
Support following redirects, honoring robots.txt for each host in the redirect chain.
-
Support an external cache to store robots.txt
-
Compatible with all options defined in em-http-request
Install
$ gem install rdaneel
Examples
Following redirects
require 'rdaneel'
EM.run do
r = RDaneel.new("http://bit.ly/cbEnpa")
r.callback{
puts r.http_client.response_header.status
puts r.http_client.response[0,80]
puts r.redirects
puts r.uri
EM.stop
}
r.errback{
puts "should not happen"
EM.stop
}
r.get(:redirects => 3)
end
=> 200
=> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
=> http://bit.ly:80/cbEnpa
=> http://github.com:80/hasmanydevelopers/RDaneel
Denied by robots.txt
require 'rdaneel'
EM.run do
r = RDaneel.new("http://github.com/hasmanydevelopers/RDaneel/tarball/v0.0.0")
r.callback{
puts "should not happen"
EM.stop
}
r.errback{
puts r.error
EM.stop
}
r.get(:redirects => 3)
end
=> robots denied
Why RDaneel?
R Daneel Olivaw is a fictional robot created by Isaac Asimov - en.wikipedia.org/wiki/R._Daneel_Olivaw
Acknowledge
To Ilya Grigorik (@igrigorik) for em-http-request lib and his support and advice.
Note on Patches/Pull Requests
-
Fork the project.
-
Make your feature addition or bug fix.
-
Add tests for it. This is important so I don’t break it in a future version unintentionally.
-
Commit, do not mess with rakefile, version, or history. (if you want to have your own version, that is fine but bump version in a commit by itself I can ignore when I pull)
-
Send me a pull request. Bonus points for topic branches.
Copyright
Copyright © 2010 has_many :developers. See LICENSE for details.