Class: Gort::Parser

Inherits:
Object
  • Object
show all
Defined in:
lib/gort/parser.rb

Overview

robots.txt parser. It implements the parsing logic according to RFC 9309, including errata.

Defined Under Namespace

Classes: BinaryInputError, InvalidEncodingError

Instance Method Summary collapse

Constructor Details

#initialize(input) ⇒ Parser

Returns a new instance of Parser.

Parameters:

  • input (String)

    The robots.txt content to parse. It must be encoded in UTF-8 or compatible encoding.



25
26
27
# File 'lib/gort/parser.rb', line 25

def initialize(input)
  @input = detect_and_fix_encoding(input).then { |string| strip_bom(string) }
end

Instance Method Details

#parseGort::RobotsTxt

Actually parse the file.

Returns:



43
44
45
46
47
48
49
50
51
52
53
54
55
# File 'lib/gort/parser.rb', line 43

def parse
  content_lines =
    input.lines.map { |line|
      line.split("#", 2).first.strip
    }
    .reject(&:empty?)

  rules = content_lines.map { |line| parse_line(line) }
  grouped_rules, standalone_rules = partition_rules(rules)
  groups = group_rules(grouped_rules)

  RobotsTxt.new(groups + standalone_rules)
end