Class: NCSAParser::Parser

Inherits:
Object
  • Object
show all
Defined in:
lib/ncsa-parser/parser.rb

Overview

A line parser for a log file. Lines are parsed via Regexps. You can inject new tokens or override existing ones by modifying the passing along a :tokens option and adding the keys to the :pattern option accordingly.

Constant Summary collapse

IP_ADDRESS =
'\d+\.\d+\.\d+\.\d+|unknown'
TOKENS =
{
  :host => "(?:#{IP_ADDRESS}|-|::1)",
  :host_proxy => "(?:#{IP_ADDRESS})(?:,\\s+#{IP_ADDRESS})*|-",
  :ident => '[^\s]+',
  :username => '[^\s]+',
  :datetime => '\[[^\]]+\]',
  :request => '".+"',
  :status => '\d+',
  :bytes => '\d+|-',
  :referer => '".*"',
  :ua => '".*"',
  :usertrack => "(?:#{IP_ADDRESS})[^ ]+|-",
  :outstream => '\d+|-',
  :instream => '\d+|-',
  :ratio => '\d+%|-%'
}
LOG_FORMAT_COMMON =
%w{
  host ident username datetime request status bytes
}
LOG_FORMAT_COMBINED =
%w{
  host ident username datetime request status bytes referer ua
}

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(options = {}) ⇒ Parser

Creates a new Parser object.

Options

  • :domain - when parsing query strings, use this domain as the URL’s domain. The default is www.example.com.

  • :datetime_format - sets the datetime format for when tokens are converted in NCSAParser::ParsedLine. The default is “[%d/%b/%Y:%H:%M:%S %Z]”.

  • :pattern - the default log line format to use. The default is LOG_FORMAT_COMBINED, which matches the “combined” log format in Apache. The value for :pattern can be either a space-delimited String of token names or an Array of token names.

  • :browscap - a browser capabilities object to use when sniffing out user agents. This object should be able to respond to the query method. Several browscap extensions are available for Ruby, and the the author of this extension’s version is called Browscapper and is available at github.com/dark-panda/browscapper .

  • :token_conversions - converters to pass along to the line parser. See NCSAParser::ParsedLine for details.

  • :tokens - tokens to add to the generated Regexp.



67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
# File 'lib/ncsa-parser/parser.rb', line 67

def initialize(options = {})
  options = {
    :domain => 'www.example.com',
    :datetime_format => '[%d/%b/%Y:%H:%M:%S %Z]',
    :pattern => LOG_FORMAT_COMBINED
  }.merge(options)

  @options = options
  @pattern = if options[:pattern].is_a?(Array)
    options[:pattern]
  else
    options[:pattern].to_s.split(/\s+/)
  end

  @re = '^' + @pattern.collect { |tk|
    tk = tk.to_sym
    token = if options[:tokens] && options[:tokens][tk]
      options[:tokens][tk]
    elsif TOKENS[tk]
      TOKENS[tk]
    else
      raise ArgumentError.new("Token :#{tk} not found!")
    end

    "(#{token})"
  }.join(' ') + '$'
  @matcher = Regexp.new(@re)
end

Instance Attribute Details

#matcherObject (readonly)

Returns the value of attribute matcher.



45
46
47
# File 'lib/ncsa-parser/parser.rb', line 45

def matcher
  @matcher
end

#patternObject (readonly)

Returns the value of attribute pattern.



45
46
47
# File 'lib/ncsa-parser/parser.rb', line 45

def pattern
  @pattern
end

#reObject (readonly)

Returns the value of attribute re.



45
46
47
# File 'lib/ncsa-parser/parser.rb', line 45

def re
  @re
end

Instance Method Details

#parse_line(line) ⇒ Object Also known as: parse

Parses a single line and returns an NCSAParser::ParsedLine object.



97
98
99
100
101
102
103
104
105
106
107
108
# File 'lib/ncsa-parser/parser.rb', line 97

def parse_line(line)
  match = Hash.new
  if md = @matcher.match(line)
    @pattern.each_with_index do |k, j|
      match[k.to_sym] = md[j + 1]
    end
    match[:original] = line.strip
  else
    raise BadLogLine.new(line, @options[:pattern])
  end
  ParsedLine.new(match, @options)
end