Module: RequestLogAnalyzer::FileFormat::CommonRegularExpressions

Included in:
AmazonS3, Apache, DelayedJob2, DelayedJob21, DelayedJob3, DelayedJob4, Haproxy, Merb, Mysql, Postgresql, Rails, Rails3, W3c
Defined in:
lib/request_log_analyzer/file_format.rb

Overview

This module contains some methods to construct regular expressions for log fragments that are commonly used, like IP addresses and timestamp.

You need to extend (or include in an unlikely case) this module in your file format to use these regular expression constructors.

Constant Summary collapse

TIMESTAMP_PARTS =
{
  'a' => '(?:Mon|Tue|Wed|Thu|Fri|Sat|Sun)',
  'b' => '(?:Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)',
  'y' => '\d{2}', 'Y' => '\d{4}', 'm' => '\d{2}', 'd' => '\d{2}',
  'H' => '\d{2}', 'M' => '\d{2}', 'S' => '\d{2}', 'k' => '(?:\d| )\d',
  'z' => '(?:[+-]\d{4}|[A-Z]{3,4})',
  'Z' => '(?:[+-]\d{4}|[A-Z]{3,4})',
  '%' => '%'
}

Instance Method Summary collapse

Instance Method Details

#anchored(regexp) ⇒ Object



176
177
178
# File 'lib/request_log_analyzer/file_format.rb', line 176

def anchored(regexp)
  /^#{regexp}$/
end

#hostname(blank = false) ⇒ Object

Creates a regular expression to match a hostname



128
129
130
131
# File 'lib/request_log_analyzer/file_format.rb', line 128

def hostname(blank = false)
  regexp = /(?:(?:[a-zA-Z]|[a-zA-Z][a-zA-Z0-9\-]*[a-zA-Z0-9])\.)*(?:[A-Za-z]|[A-Za-z][A-Za-z0-9\-]*[A-Za-z0-9])/
  add_blank_option(regexp, blank)
end

#hostname_or_ip_address(blank = false) ⇒ Object

Creates a regular expression to match a hostname or ip address



134
135
136
137
# File 'lib/request_log_analyzer/file_format.rb', line 134

def hostname_or_ip_address(blank = false)
  regexp = Regexp.union(hostname, ip_address)
  add_blank_option(regexp, blank)
end

#ip_address(blank = false) ⇒ Object

Construct a regular expression to parse IPv4 and IPv6 addresses.

Allow nil values if the blank option is given. This can be true to allow an empty string or to a string substitute for the nil value.



163
164
165
166
167
168
169
170
171
172
173
174
# File 'lib/request_log_analyzer/file_format.rb', line 163

def ip_address(blank = false)
  # IP address regexp copied from Resolv::IPv4 and Resolv::IPv6,
  # but adjusted to work for the purpose of request-log-analyzer.
  ipv4_regexp                     = /\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}/
  ipv6_regex_8_hex                = /(?:[0-9A-Fa-f]{1,4}:){7}[0-9A-Fa-f]{1,4}/
  ipv6_regex_compressed_hex       = /(?:(?:[0-9A-Fa-f]{1,4}(?::[0-9A-Fa-f]{1,4})*)?)::(?:(?:[0-9A-Fa-f]{1,4}(?::[0-9A-Fa-f]{1,4})*)?)/
  ipv6_regex_6_hex_4_dec          = /(?:(?:[0-9A-Fa-f]{1,4}:){6})#{ipv4_regexp}/
  ipv6_regex_compressed_hex_4_dec = /(?:(?:[0-9A-Fa-f]{1,4}(?::[0-9A-Fa-f]{1,4})*)?)::(?:(?:[0-9A-Fa-f]{1,4}:)*)#{ipv4_regexp}/
  ipv6_regexp                     = Regexp.union(ipv6_regex_8_hex, ipv6_regex_compressed_hex, ipv6_regex_6_hex_4_dec, ipv6_regex_compressed_hex_4_dec)

  add_blank_option(Regexp.union(ipv4_regexp, ipv6_regexp), blank)
end

#timestamp(format_string, blank = false) ⇒ Object

Create a regular expression for a timestamp, generated by a strftime call. Provide the format string to construct a matching regular expression. Set blank to true to allow and empty string, or set blank to a string to set a substitute for the nil value.



143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
# File 'lib/request_log_analyzer/file_format.rb', line 143

def timestamp(format_string, blank = false)
  regexp = ''
  format_string.scan(/([^%]*)(?:%([A-Za-z%]))?/) do |literal, variable|
    regexp << Regexp.quote(literal)
    if variable
      if TIMESTAMP_PARTS.key?(variable)
        regexp << TIMESTAMP_PARTS[variable]
      else
        fail "Unknown variable: %#{variable}"
      end
    end
  end

  add_blank_option(Regexp.new(regexp), blank)
end