Module: Robotstxt
- Defined in:
- lib/robotstxt.rb,
lib/robotstxt/parser.rb
Defined Under Namespace
Classes: Parser
Constant Summary collapse
- NAME =
'Robotstxt'
- GEM =
'robotstxt'
- AUTHORS =
['Simone Rinzivillo <[email protected]>']
- VERSION =
'0.5.4'
Class Method Summary collapse
-
.allowed?(url, robot_id) ⇒ Boolean
Check if the
URL
is allowed to be crawled from the currentRobot_id
. -
.sitemaps(url, robot_id) ⇒ Object
Analyze the robots.txt file to return an
Array
containing the list of XML Sitemaps URLs.
Class Method Details
.allowed?(url, robot_id) ⇒ Boolean
Check if the URL
is allowed to be crawled from the current Robot_id
. Robotstxt::Allowed? returns true
if the robots.txt file does not block the access to the URL.
Robotstxt.allowed?('http://www.simonerinzivillo.it/', 'rubytest')
35 36 37 38 39 40 41 |
# File 'lib/robotstxt.rb', line 35 def self.allowed?(url, robot_id) u = URI.parse(url) r = Robotstxt::Parser.new(robot_id) r.allowed?(url) if r.get(u.scheme + '://' + u.host) end |
.sitemaps(url, robot_id) ⇒ Object
Analyze the robots.txt file to return an Array
containing the list of XML Sitemaps URLs.
Robotstxt.sitemaps('http://www.simonerinzivillo.it/', 'rubytest')
47 48 49 50 51 52 53 |
# File 'lib/robotstxt.rb', line 47 def self.sitemaps(url, robot_id) u = URI.parse(url) r = Robotstxt::Parser.new(robot_id) r.sitemaps if r.get(u.scheme + '://' + u.host) end |