Class: RequestLogAnalyzer::Source::LogParser

Inherits:
Base
  • Object
show all
Includes:
Enumerable
Defined in:
lib/request_log_analyzer/source/log_parser.rb

Overview

The LogParser class reads log data from a given source and uses a file format definition to parse all relevent information about requests from the file. A FileFormat module should be provided that contains the definitions of the lines that occur in the log data.

De order in which lines occur is used to combine lines to a single request. If these lines are mixed, requests cannot be combined properly. This can be the case if data is written to the log file simultaneously by different mongrel processes. This problem is detected by the parser. It will emit warnings when this occurs. LogParser supports multiple parse strategies that deal differently with this problem.

Constant Summary collapse

DEFAULT_MAX_LINE_LENGTH =

The maximum number of bytes to read from a line.

8096
DEFAULT_LINE_DIVIDER =
"\n"
DEFAULT_PARSE_STRATEGY =

The default parse strategy that will be used to parse the input.

'assume-correct'
PARSE_STRATEGIES =

All available parse strategies.

['cautious', 'assume-correct']

Instance Attribute Summary collapse

Attributes inherited from Base

#current_request, #file_format, #options

Instance Method Summary collapse

Methods inherited from Base

#finalize, #prepare

Constructor Details

#initialize(format, options = {}) ⇒ LogParser

Initializes the log file parser instance. It will apply the language specific FileFormat module to this instance. It will use the line definitions in this module to parse any input that it is given (see parse_io).

format

The current file format instance

options

A hash of options that are used by the parser



34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
# File 'lib/request_log_analyzer/source/log_parser.rb', line 34

def initialize(format, options = {})
  super(format, options)
  @warnings         = 0
  @parsed_lines     = 0
  @parsed_requests  = 0
  @skipped_lines    = 0
  @skipped_requests = 0
  @current_request  = nil
  @current_source   = nil
  @current_file     = nil
  @current_lineno   = nil
  @processed_files  = []
  @source_files     = options[:source_files]
  @progress_handler = nil
  @warning_handler  = nil

  @options[:parse_strategy] ||= DEFAULT_PARSE_STRATEGY
  unless PARSE_STRATEGIES.include?(@options[:parse_strategy])
    fail "Unknown parse strategy: #{@options[@parse_strategy]}"
  end
end

Instance Attribute Details

#current_fileObject (readonly)

Returns the value of attribute current_file.



25
26
27
# File 'lib/request_log_analyzer/source/log_parser.rb', line 25

def current_file
  @current_file
end

#current_linenoObject (readonly)

Returns the value of attribute current_lineno.



25
26
27
# File 'lib/request_log_analyzer/source/log_parser.rb', line 25

def current_lineno
  @current_lineno
end

#parsed_linesObject (readonly)

Returns the value of attribute parsed_lines.



26
27
28
# File 'lib/request_log_analyzer/source/log_parser.rb', line 26

def parsed_lines
  @parsed_lines
end

#parsed_requestsObject (readonly)

Returns the value of attribute parsed_requests.



26
27
28
# File 'lib/request_log_analyzer/source/log_parser.rb', line 26

def parsed_requests
  @parsed_requests
end

#processed_filesObject (readonly)

Returns the value of attribute processed_files.



25
26
27
# File 'lib/request_log_analyzer/source/log_parser.rb', line 25

def processed_files
  @processed_files
end

#skipped_linesObject (readonly)

Returns the value of attribute skipped_lines.



26
27
28
# File 'lib/request_log_analyzer/source/log_parser.rb', line 26

def skipped_lines
  @skipped_lines
end

#skipped_requestsObject (readonly)

Returns the value of attribute skipped_requests.



26
27
28
# File 'lib/request_log_analyzer/source/log_parser.rb', line 26

def skipped_requests
  @skipped_requests
end

#source_filesObject (readonly)

Returns the value of attribute source_files.



25
26
27
# File 'lib/request_log_analyzer/source/log_parser.rb', line 25

def source_files
  @source_files
end

#warningsObject (readonly)

Returns the value of attribute warnings.



26
27
28
# File 'lib/request_log_analyzer/source/log_parser.rb', line 26

def warnings
  @warnings
end

Instance Method Details

#decompress_file?(filename) ⇒ Boolean

Check if a file has a compressed extention in the filename. If recognized, return the command string used to decompress the file

Returns:

  • (Boolean)


97
98
99
100
101
102
103
104
105
# File 'lib/request_log_analyzer/source/log_parser.rb', line 97

def decompress_file?(filename)
  nice_command = 'nice -n 5'

  return "#{nice_command} gunzip -c -d #{filename}" if filename.match(/\.tar.gz$/) || filename.match(/\.tgz$/) || filename.match(/\.gz$/)
  return "#{nice_command} bunzip2 -c -d #{filename}" if filename.match(/\.bz2$/)
  return "#{nice_command} unzip -p #{filename}" if filename.match(/\.zip$/)

  ''
end

#each_request(options = {}, &block) ⇒ Object Also known as: each

Reads the input, which can either be a file, sequence of files or STDIN to parse lines specified in the FileFormat. This lines will be combined into Request instances, that will be yielded. The actual parsing occurs in the parse_io method.

options

A Hash of options that will be pased to parse_io.



68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
# File 'lib/request_log_analyzer/source/log_parser.rb', line 68

def each_request(options = {}, &block) # :yields: :request, request
  case @source_files
  when IO
    if @source_files == $stdin
      puts 'Parsing from the standard input. Press CTRL+C to finish.' # FIXME: not here
    end
    parse_stream(@source_files, options, &block)
  when String
    parse_file(@source_files, options, &block)
  when Array
    parse_files(@source_files, options, &block)
  else
    fail 'Unknown source provided'
  end
end

#line_dividerObject



60
61
62
# File 'lib/request_log_analyzer/source/log_parser.rb', line 60

def line_divider
  file_format.line_divider || DEFAULT_LINE_DIVIDER
end

#max_line_lengthObject



56
57
58
# File 'lib/request_log_analyzer/source/log_parser.rb', line 56

def max_line_length
  file_format.max_line_length || DEFAULT_MAX_LINE_LENGTH
end

#parse_file(file, options = {}, &block) ⇒ Object

Parses a log file. Creates an IO stream for the provided file, and sends it to parse_io for further handling. This method supports progress updates that can be used to display a progressbar

If the logfile is compressed, it is uncompressed to stdout and read. TODO: Check if IO.popen encounters problems with the given command line. TODO: Fix progress bar that is broken for IO.popen, as it returns a single string.

file

The file that should be parsed.

options

A Hash of options that will be pased to parse_io.



116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
# File 'lib/request_log_analyzer/source/log_parser.rb', line 116

def parse_file(file, options = {}, &block)
  if File.directory?(file)
    parse_files(Dir["#{ file }/*"], options, &block)
    return
  end

  @current_source = File.expand_path(file)
  @source_changes_handler.call(:started, @current_source) if @source_changes_handler

  if decompress_file?(file).empty?

    @progress_handler = @dormant_progress_handler
    @progress_handler.call(:started, file) if @progress_handler

    File.open(file, 'rb') { |f| parse_io(f, options, &block) }

    @progress_handler.call(:finished, file) if @progress_handler
    @progress_handler = nil

    @processed_files.push(@current_source.dup)

  else
    IO.popen(decompress_file?(file), 'rb') { |f| parse_io(f, options, &block) }
  end

  @source_changes_handler.call(:finished, @current_source) if @source_changes_handler

  @current_source = nil
end

#parse_files(files, options = {}, &block) ⇒ Object

Parses a list of subsequent files of the same format, by calling parse_file for every file in the array.

files

The Array of files that should be parsed

options

A Hash of options that will be pased to parse_io.



91
92
93
# File 'lib/request_log_analyzer/source/log_parser.rb', line 91

def parse_files(files, options = {}, &block) # :yields: request
  files.each { |file| parse_file(file, options, &block) }
end

#parse_io_18(io, options = {}, &block) ⇒ Object

This method loops over each line of the input stream. It will try to parse this line as any of the lines that are defined by the current file format (see RequestLogAnalyazer::FileFormat). It will then combine these parsed line into requests using heuristics. These requests (see RequestLogAnalyzer::Request) will then be yielded for further processing in the pipeline.

  • RequestLogAnalyzer::LineDefinition#matches is called to test if a line matches a line definition of the file format.

  • update_current_request is used to combine parsed lines into requests using heuristics.

  • The method will yield progress updates if a progress handler is installed using progress=

  • The method will yield parse warnings if a warning handler is installed using warning=

This is a Ruby 1.8 specific version that doesn’t offer memory protection.

io

The IO instance to use as source

options

A hash of options that can be used by the parser.



203
204
205
206
207
208
209
210
211
212
213
214
# File 'lib/request_log_analyzer/source/log_parser.rb', line 203

def parse_io_18(io, options = {}, &block) # :yields: request
  @line_divider    = options[:line_divider]    || line_divider
  @current_lineno  = 0
  while line = io.gets(@line_divider)
    @current_lineno += 1
    @progress_handler.call(:progress, io.pos) if @progress_handler && @current_lineno % 255 == 0
    parse_line(line, &block)
  end

  warn(:unfinished_request_on_eof, 'End of file reached, but last request was not completed!') unless @current_request.nil?
  @current_lineno = nil
end

#parse_io_19(io, options = {}, &block) ⇒ Object

This method loops over each line of the input stream. It will try to parse this line as any of the lines that are defined by the current file format (see RequestLogAnalyazer::FileFormat). It will then combine these parsed line into requests using heuristics. These requests (see RequestLogAnalyzer::Request) will then be yielded for further processing in the pipeline.

  • RequestLogAnalyzer::LineDefinition#matches is called to test if a line matches a line definition of the file format.

  • update_current_request is used to combine parsed lines into requests using heuristics.

  • The method will yield progress updates if a progress handler is installed using progress=

  • The method will yield parse warnings if a warning handler is installed using warning=

This is a Ruby 1.9 specific version that offers memory protection.

io

The IO instance to use as source

options

A hash of options that can be used by the parser.



175
176
177
178
179
180
181
182
183
184
185
186
187
# File 'lib/request_log_analyzer/source/log_parser.rb', line 175

def parse_io_19(io, options = {}, &block) # :yields: request
  @max_line_length = options[:max_line_length] || max_line_length
  @line_divider    = options[:line_divider]    || line_divider
  @current_lineno  = 0
  while line = io.gets(@line_divider, @max_line_length)
    @current_lineno += 1
    @progress_handler.call(:progress, io.pos) if @progress_handler && @current_lineno % 255 == 0
    parse_line(line, &block)
  end

  warn(:unfinished_request_on_eof, 'End of file reached, but last request was not completed!') unless @current_request.nil?
  @current_lineno = nil
end

#parse_line(line, &block) ⇒ Object

Parses a single line using the current file format. If successful, use the parsed information to build a request

line

The line to parse

block

The block to send fully parsed requests to.



222
223
224
225
226
227
# File 'lib/request_log_analyzer/source/log_parser.rb', line 222

def parse_line(line, &block) # :yields: request
  if request_data = file_format.parse_line(line) { |wt, message| warn(wt, message) }
    @parsed_lines += 1
    update_current_request(request_data.merge(source: @current_source, lineno: @current_lineno), &block)
  end
end

#parse_stream(stream, options = {}, &block) ⇒ Object

Parses an IO stream. It will simply call parse_io. This function does not support progress updates because the length of a stream is not known.

stream

The IO stream that should be parsed.

options

A Hash of options that will be pased to parse_io.



150
151
152
# File 'lib/request_log_analyzer/source/log_parser.rb', line 150

def parse_stream(stream, options = {}, &block)
  parse_io(stream, options, &block)
end

#parse_string(string, options = {}, &block) ⇒ Object

Parses a string. It will simply call parse_io. This function does not support progress updates.

string

The string that should be parsed.

options

A Hash of options that will be pased to parse_io.



157
158
159
# File 'lib/request_log_analyzer/source/log_parser.rb', line 157

def parse_string(string, options = {}, &block)
  parse_io(StringIO.new(string), options, &block)
end

#progress=(proc) ⇒ Object

Add a block to this method to install a progress handler while parsing.

proc

The proc that will be called to handle progress update messages



231
232
233
# File 'lib/request_log_analyzer/source/log_parser.rb', line 231

def progress=(proc)
  @dormant_progress_handler = proc
end

#source_changes=(proc) ⇒ Object

Add a block to this method to install a source change handler while parsing,

proc

The proc that will be called to handle source changes



243
244
245
# File 'lib/request_log_analyzer/source/log_parser.rb', line 243

def source_changes=(proc)
  @source_changes_handler = proc
end

#warn(type, message) ⇒ Object

This method is called by the parser if it encounteres any parsing problems. It will call the installed warning handler if any.

By default, RequestLogAnalyzer::Controller will install a warning handler that will pass the warnings to each aggregator so they can do something useful with it.

type

The warning type (a Symbol)

message

A message explaining the warning



256
257
258
259
# File 'lib/request_log_analyzer/source/log_parser.rb', line 256

def warn(type, message)
  @warnings += 1
  @warning_handler.call(type, message, @current_lineno) if @warning_handler
end

#warning=(proc) ⇒ Object

Add a block to this method to install a warning handler while parsing,

proc

The proc that will be called to handle parse warning messages



237
238
239
# File 'lib/request_log_analyzer/source/log_parser.rb', line 237

def warning=(proc)
  @warning_handler = proc
end