Class: RequestLogAnalyzer::FileFormat::Apache
- Extended by:
- CommonRegularExpressions
- Defined in:
- lib/request_log_analyzer/file_format/apache.rb
Overview
The Apache file format is able to log Apache access.log files.
The access.log can be configured in Apache to have many different formats. In theory, this FileFormat can handle any format, but it must be aware of the log formatting that is used by sending the formatting string as parameter to the create method, e.g.:
RequestLogAnalyzer::FileFormat::Apache.create('%h %l %u %t "%r" %>s %b')
It also supports the predefined Apache log formats “common” and “combined”. The line definition and the report definition will be constructed using this file format string. From the command line, you can provide the format string using the --apache-format command line option.
Defined Under Namespace
Classes: Request
Constant Summary collapse
- LOG_FORMAT_DEFAULTS =
A hash of predefined Apache log formats
{ common: '%h %l %u %t "%r" %>s %b', combined: '%h %l %u %t "%r" %>s %b "%{Referer}i" "%{User-agent}i"', vhost_combined: '%h %l %v %t "%r" %>s %b "%{Referer}i" "%{User-agent}i" %T/%D', nginx: '%a %t %h %u "%r" %>s %b', rack: '%h %l %u %t "%r" %>s %b %T', referer: '%{Referer}i -> %U', agent: '%{User-agent}i' }
- APACHE_TIMESTAMP =
I have encountered two timestamp types, with timezone and without. Parse both.
Regexp.union(('%d/%b/%Y:%H:%M:%S %z'), ('%d/%b/%Y %H:%M:%S'))
- LOG_DIRECTIVES =
A hash that defines how the log format directives should be parsed.
{ '%' => { nil => { regexp: '%', captures: [] } }, 'v' => { nil => { regexp: "(#{hostname_or_ip_address})", captures: [{ name: :vhost, type: :string }] } }, 'h' => { nil => { regexp: "(#{hostname_or_ip_address})", captures: [{ name: :remote_host, type: :string }] } }, 'a' => { nil => { regexp: "(#{ip_address})", captures: [{ name: :remote_ip, type: :string }] } }, 'b' => { nil => { regexp: '(\d+|-)', captures: [{ name: :bytes_sent, type: :traffic }] } }, 'c' => { nil => { regexp: '(\+|\-|\X)', captures: [{ name: :connection_status, type: :integer }] } }, 'D' => { nil => { regexp: '(\d+|-)', captures: [{ name: :duration, type: :duration, unit: :musec }] }, 'micro' => { regexp: '(\d+|-)', captures: [{ name: :duration, type: :duration, unit: :musec }] }, 'milli' => { regexp: '(\d+|-)', captures: [{ name: :duration, type: :duration, unit: :msec }] } }, 'l' => { nil => { regexp: '([\w-]+)', captures: [{ name: :remote_logname, type: :nillable_string }] } }, 'T' => { nil => { regexp: '(\d+(?:\.\d+)?|-)', captures: [{ name: :duration, type: :duration, unit: :sec }] } }, 't' => { nil => { regexp: "\\[(#{APACHE_TIMESTAMP})?\\]", captures: [{ name: :timestamp, type: :timestamp }] } }, 's' => { nil => { regexp: '(\d{3})', captures: [{ name: :http_status, type: :integer }] } }, 'u' => { nil => { regexp: '(\/\S*|-)', captures: [{ name: :user, type: :nillable_string }] } }, 'U' => { nil => { regexp: '(\/\S*)', captures: [{ name: :path, type: :string }] } }, 'r' => { nil => { regexp: '([A-Z]+) (\S+) HTTP\/(\d+(?:\.\d+)*)', captures: [{ name: :http_method, type: :string }, { name: :path, type: :path }, { name: :http_version, type: :string }] } }, 'i' => { 'Referer' => { regexp: '(\S+)', captures: [{ name: :referer, type: :nillable_string }] }, 'User-agent' => { regexp: '(.*)', captures: [{ name: :user_agent, type: :user_agent }] } } }
Constants included from CommonRegularExpressions
CommonRegularExpressions::TIMESTAMP_PARTS
Constants inherited from Base
Instance Attribute Summary
Attributes inherited from Base
#line_definitions, #report_trackers
Class Method Summary collapse
-
.access_line_definition(format_string) ⇒ Object
Creates the access log line definition based on the Apache log format string.
-
.create(*args) ⇒ Object
Creates the Apache log format language based on a Apache log format string.
-
.report_trackers(line_definition) ⇒ Object
Sets up the report trackers according to the fields captured by the access line definition.
Methods included from CommonRegularExpressions
anchored, hostname, hostname_or_ip_address, ip_address, timestamp
Methods inherited from Base
#captures?, format_definition, #initialize, line_definer, line_definition, #line_divider, #max_line_length, #parse_line, report, report_definer, #request, #request_class, #setup_environment, #valid_line_definitions?, #valid_request_class?, #well_formed?
Methods included from ClassLevelInheritableAttributes
#inheritable_attributes, #inherited
Constructor Details
This class inherits a constructor from RequestLogAnalyzer::FileFormat::Base
Class Method Details
.access_line_definition(format_string) ⇒ Object
Creates the access log line definition based on the Apache log format string
66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 |
# File 'lib/request_log_analyzer/file_format/apache.rb', line 66 def self.access_line_definition(format_string) format_string ||= :common format_string = LOG_FORMAT_DEFAULTS[format_string.to_sym] || format_string line_regexp = '' captures = [] format_string.scan(/([^%]*)(?:%(?:\{([^\}]+)\})?>?([A-Za-z%]))?/) do |literal, arg, variable| line_regexp << Regexp.quote(literal) # Make sure to parse the literal before the directive if variable # Check if we recognize the log directive directive = LOG_DIRECTIVES[variable][arg] rescue nil if directive line_regexp << directive[:regexp] # Parse the value of the directive captures += directive[:captures] # Add the directive's information to the captures else puts "Apache log directive %#{arg}#{variable} is not yet supported by RLA, the field will be ignored." line_regexp << '.*' # Just accept any input for this literal end end end # Return a new line definition object RequestLogAnalyzer::LineDefinition.new(:access, regexp: Regexp.new(line_regexp), captures: captures, header: true, footer: true) end |
.create(*args) ⇒ Object
Creates the Apache log format language based on a Apache log format string. It will set up the line definition and the report trackers according to the Apache access log format, which should be passed as first argument. By default, is uses the ‘combined’ log format.
59 60 61 62 63 |
# File 'lib/request_log_analyzer/file_format/apache.rb', line 59 def self.create(*args) access_line = access_line_definition(args.first) trackers = report_trackers(access_line) + report_definer.trackers new(line_definer.line_definitions.merge(access: access_line), trackers) end |
.report_trackers(line_definition) ⇒ Object
Sets up the report trackers according to the fields captured by the access line definition.
96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 |
# File 'lib/request_log_analyzer/file_format/apache.rb', line 96 def self.report_trackers(line_definition) analyze = RequestLogAnalyzer::Aggregator::Summarizer::Definer.new analyze.timespan if line_definition.captures?(:timestamp) analyze.hourly_spread if line_definition.captures?(:timestamp) analyze.frequency category: :http_method, title: 'HTTP methods' if line_definition.captures?(:http_method) analyze.frequency category: :http_status, title: 'HTTP statuses' if line_definition.captures?(:http_status) analyze.frequency category: lambda { |r| r.category }, title: 'Most popular URIs' if line_definition.captures?(:path) analyze.frequency category: :user_agent, title: 'User agents' if line_definition.captures?(:user_agent) analyze.frequency category: :referer, title: 'Referers' if line_definition.captures?(:referer) analyze.duration duration: :duration, category: lambda { |r| r.category }, title: 'Request duration' if line_definition.captures?(:duration) analyze.traffic traffic: :bytes_sent, category: lambda { |r| r.category }, title: 'Traffic' if line_definition.captures?(:bytes_sent) analyze.trackers end |