Class: Wgit::RobotsParser
- Inherits:
-
Object
- Object
- Wgit::RobotsParser
- Includes:
- Assertable
- Defined in:
- lib/wgit/robots_parser.rb
Overview
The RobotsParser class handles parsing and processing of a web servers robots.txt file.
Constant Summary collapse
- KEY_COMMENT =
Key representing the start of a comment.
"#"
- KEY_SEPARATOR =
Key value separator used in robots.txt files.
":"
- KEY_USER_AGENT =
Key representing a user agent.
"User-agent"
- KEY_ALLOW =
Key representing an allow URL rule.
"Allow"
- KEY_DISALLOW =
Key representing a disallow URL rule.
"Disallow"
- USER_AGENT_WGIT =
Value representing the Wgit user agent.
:wgit
- USER_AGENT_ANY =
Value representing any user agent including Wgit.
:*
- PATHS_ALL =
Value representing any and all paths.
%w[/ *].freeze
Constants included from Assertable
Assertable::DEFAULT_DUCK_FAIL_MSG, Assertable::DEFAULT_REQUIRED_KEYS_MSG, Assertable::DEFAULT_TYPE_FAIL_MSG, Assertable::MIXED_ENUMERABLE_MSG, Assertable::NON_ENUMERABLE_MSG
Instance Attribute Summary collapse
-
#rules ⇒ Object
(also: #paths)
readonly
Hash containing the user-agent allow/disallow URL rules.
Instance Method Summary collapse
-
#allow_paths ⇒ Array<String>
Returns the allow paths/rules for this parser's robots.txt contents.
-
#allow_rules? ⇒ Boolean
Returns whether or not there are allow rules applying to Wgit.
-
#disallow_paths ⇒ Array<String>
Returns the disallow paths/rules for this parser's robots.txt contents.
-
#disallow_rules? ⇒ Boolean
Returns whether or not there are disallow rules applying to Wgit.
-
#initialize(contents) ⇒ RobotsParser
constructor
Initializes and returns a Wgit::RobotsParser instance having parsed the robot.txt contents.
-
#inspect ⇒ String
Overrides String#inspect to shorten the printed output of a Parser.
-
#no_index? ⇒ Boolean
(also: #banned?)
Returns whether or not Wgit is banned from indexing this site.
-
#rules? ⇒ Boolean
Returns whether or not there are rules applying to Wgit.
Methods included from Assertable
#assert_arr_types, #assert_common_arr_types, #assert_required_keys, #assert_respond_to, #assert_types
Constructor Details
#initialize(contents) ⇒ RobotsParser
Initializes and returns a Wgit::RobotsParser instance having parsed the robot.txt contents.
38 39 40 41 42 43 44 45 46 |
# File 'lib/wgit/robots_parser.rb', line 38 def initialize(contents) @rules = { allow_paths: Set.new, disallow_paths: Set.new } assert_respond_to(contents, :to_s) parse(contents.to_s) end |
Instance Attribute Details
#rules ⇒ Object (readonly) Also known as: paths
Hash containing the user-agent allow/disallow URL rules. Looks like: allow_paths: ["/"] disallow_paths: ["/accounts", ...]
31 32 33 |
# File 'lib/wgit/robots_parser.rb', line 31 def rules @rules end |
Instance Method Details
#allow_paths ⇒ Array<String>
Returns the allow paths/rules for this parser's robots.txt contents.
58 59 60 |
# File 'lib/wgit/robots_parser.rb', line 58 def allow_paths @rules[:allow_paths].to_a end |
#allow_rules? ⇒ Boolean
Returns whether or not there are allow rules applying to Wgit.
81 82 83 |
# File 'lib/wgit/robots_parser.rb', line 81 def allow_rules? @rules[:allow_paths].any? end |
#disallow_paths ⇒ Array<String>
Returns the disallow paths/rules for this parser's robots.txt contents.
65 66 67 |
# File 'lib/wgit/robots_parser.rb', line 65 def disallow_paths @rules[:disallow_paths].to_a end |
#disallow_rules? ⇒ Boolean
Returns whether or not there are disallow rules applying to Wgit.
89 90 91 |
# File 'lib/wgit/robots_parser.rb', line 89 def disallow_rules? @rules[:disallow_paths].any? end |
#inspect ⇒ String
Overrides String#inspect to shorten the printed output of a Parser.
51 52 53 |
# File 'lib/wgit/robots_parser.rb', line 51 def inspect "#<Wgit::RobotsParser has_rules=#{rules?} no_index=#{no_index?}>" end |
#no_index? ⇒ Boolean Also known as: banned?
Returns whether or not Wgit is banned from indexing this site.
97 98 99 |
# File 'lib/wgit/robots_parser.rb', line 97 def no_index? @rules[:disallow_paths].any? { |path| PATHS_ALL.include?(path) } end |
#rules? ⇒ Boolean
Returns whether or not there are rules applying to Wgit.
73 74 75 |
# File 'lib/wgit/robots_parser.rb', line 73 def rules? allow_rules? || disallow_rules? end |