Class: Gort::Parser
- Inherits:
-
Object
- Object
- Gort::Parser
- Defined in:
- lib/gort/parser.rb
Overview
robots.txt parser. It implements the parsing logic according to RFC 9309, including errata.
Defined Under Namespace
Classes: BinaryInputError, InvalidEncodingError
Instance Method Summary collapse
-
#initialize(input) ⇒ Parser
constructor
A new instance of Parser.
-
#parse ⇒ Gort::RobotsTxt
Actually parse the file.
Constructor Details
#initialize(input) ⇒ Parser
Returns a new instance of Parser.
25 26 27 |
# File 'lib/gort/parser.rb', line 25 def initialize(input) @input = detect_and_fix_encoding(input).then { |string| strip_bom(string) } end |
Instance Method Details
#parse ⇒ Gort::RobotsTxt
Actually parse the file.
43 44 45 46 47 48 49 50 51 52 53 54 55 |
# File 'lib/gort/parser.rb', line 43 def parse content_lines = input.lines.map { |line| line.split("#", 2).first.strip } .reject(&:empty?) rules = content_lines.map { |line| parse_line(line) } grouped_rules, standalone_rules = partition_rules(rules) groups = group_rules(grouped_rules) RobotsTxt.new(groups + standalone_rules) end |