Class: LogStash::Filters::Grok
- Inherits:
-
Base
- Object
- Base
- LogStash::Filters::Grok
- Defined in:
- lib/logstash/filters/grok.rb
Overview
Parse arbitrary text and structure it.
Grok is currently the best way in Logstash to parse unstructured log data into something structured and queryable.
This tool is perfect for syslog logs, apache and other webserver logs, mysql logs, and in general, any log format that is generally written for humans and not computer consumption.
Logstash ships with about 120 patterns by default. You can find them here: <github.com/logstash-plugins/logstash-patterns-core/tree/master/patterns>. You can add your own trivially. (See the ‘patterns_dir` setting)
If you need help building patterns to match your logs, you will find the <grokdebug.herokuapp.com> and <grokconstructor.appspot.com/> applications quite useful!
Grok Basics
Grok works by combining text patterns into something that matches your logs.
The syntax for a grok pattern is ‘%SYNTAX:SEMANTIC`
The ‘SYNTAX` is the name of the pattern that will match your text. For example, `3.44` will be matched by the `NUMBER` pattern and `55.3.244.1` will be matched by the `IP` pattern. The syntax is how you match.
The ‘SEMANTIC` is the identifier you give to the piece of text being matched. For example, `3.44` could be the duration of an event, so you could call it simply `duration`. Further, a string `55.3.244.1` might identify the `client` making a request.
For the above example, your grok filter would look something like this:
- source,ruby
-
%NUMBER:duration %IP:client
Optionally you can add a data type conversion to your grok pattern. By default all semantics are saved as strings. If you wish to convert a semantic’s data type, for example change a string to an integer then suffix it with the target data type. For example ‘%NUMBER:num:int` which converts the `num` semantic from a string to an integer. Currently the only supported conversions are `int` and `float`.
.Examples:
With that idea of a syntax and semantic, we can pull out useful fields from a sample log like this fictional http request log:
- source,ruby
-
55.3.244.1 GET /index.html 15824 0.043
The pattern for this could be:
- source,ruby
-
%IP:client %WORD:method %URIPATHPARAM:request %NUMBER:bytes %NUMBER:duration
A more realistic example, let’s read these logs from a file:
- source,ruby
-
input {
file { path => "/var/log/http.log" }
} filter
grok { match => { "message" => "%{IP:client %WORD:method %URIPATHPARAM:request %NUMBER:bytes %NUMBER:duration" } }
}
After the grok filter, the event will have a few extra fields in it:
-
‘client: 55.3.244.1`
-
‘method: GET`
-
‘request: /index.html`
-
‘bytes: 15824`
-
‘duration: 0.043`
Regular Expressions
Grok sits on top of regular expressions, so any regular expressions are valid in grok as well. The regular expression library is Oniguruma, and you can see the full supported regexp syntax github.com/kkos/oniguruma/blob/master/doc/RE[on the Oniguruma site].
Custom Patterns
Sometimes logstash doesn’t have a pattern you need. For this, you have a few options.
First, you can use the Oniguruma syntax for named capture which will let you match a piece of text and save it as a field:
- source,ruby
-
(?<field_name>the pattern here)
For example, postfix logs have a ‘queue id` that is an 10 or 11-character hexadecimal value. I can capture that easily like this:
- source,ruby
-
(?<queue_id>{10,11})
Alternately, you can create a custom patterns file.
-
Create a directory called ‘patterns` with a file in it called `extra` (the file name doesn’t matter, but name it meaningfully for yourself)
-
In that file, write the pattern you need as the pattern name, a space, then the regexp for that pattern.
For example, doing the postfix queue id example as above:
- source,ruby
-
# contents of ./patterns/postfix: POSTFIX_QUEUEID [0-9A-F]{10,11}
Then use the ‘patterns_dir` setting in this plugin to tell logstash where your custom patterns directory is. Here’s a full example with a sample log:
- source,ruby
-
Jan 1 06:25:43 mailserver14 postfix/cleanup: BEF25A72965: message-id=<[email protected]>
- source,ruby
-
filter {
grok { patterns_dir => ["./patterns"] match => { "message" => "%{SYSLOGBASE} %{POSTFIX_QUEUEID:queue_id}: %{GREEDYDATA:syslog_message}" } }
}
The above will match and result in the following fields:
-
‘timestamp: Jan 1 06:25:43`
-
‘logsource: mailserver14`
-
‘program: postfix/cleanup`
-
‘pid: 21403`
-
‘queue_id: BEF25A72965`
-
‘syslog_message: message-id=<[email protected]>`
The ‘timestamp`, `logsource`, `program`, and `pid` fields come from the `SYSLOGBASE` pattern which itself is defined by other patterns.
Another option is to define patterns inline in the filter using ‘pattern_definitions`. This is mostly for convenience and allows user to define a pattern which can be used just in that filter. This newly defined patterns in `pattern_definitions` will not be available outside of that particular `grok` filter.
Instance Attribute Summary collapse
-
#timeout_enforcer ⇒ Object
readonly
Returns the value of attribute timeout_enforcer.
Instance Method Summary collapse
- #filter(event) ⇒ Object
-
#initialize(params) ⇒ Grok
constructor
A new instance of Grok.
- #register ⇒ Object
Constructor Details
#initialize(params) ⇒ Grok
Returns a new instance of Grok.
249 250 251 252 253 254 255 256 |
# File 'lib/logstash/filters/grok.rb', line 249 def initialize(params) super(params) # a cache of capture name handler methods. @handlers = {} @timeout_enforcer = TimeoutEnforcer.new(@logger, @timeout_millis * 1000000) @timeout_enforcer.start! unless @timeout_millis == 0 end |
Instance Attribute Details
#timeout_enforcer ⇒ Object (readonly)
Returns the value of attribute timeout_enforcer.
239 240 241 |
# File 'lib/logstash/filters/grok.rb', line 239 def timeout_enforcer @timeout_enforcer end |
Instance Method Details
#filter(event) ⇒ Object
293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 |
# File 'lib/logstash/filters/grok.rb', line 293 def filter(event) matched = false done = false @logger.debug? and @logger.debug("Running grok filter", :event => event); @patterns.each do |field, groks| success = match(groks, field, event) if success matched = true break if @break_on_match end #break if done end # @patterns.each if matched metric.increment(:matches) filter_matched(event) else metric.increment(:failures) @tag_on_failure.each {|tag| event.tag(tag)} end @logger.debug? and @logger.debug("Event now: ", :event => event) rescue ::LogStash::Filters::Grok::TimeoutException => e @logger.warn(e.) metric.increment(:timeouts) event.tag(@tag_on_timeout) end |
#register ⇒ Object
259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 |
# File 'lib/logstash/filters/grok.rb', line 259 def register require "grok-pure" # rubygem 'jls-grok' @patternfiles = [] # Have @@patterns_path show first. Last-in pattern definitions win; this # will let folks redefine built-in patterns at runtime. @patternfiles += patterns_files_from_paths(@@patterns_path.to_a, "*") @patternfiles += patterns_files_from_paths(@patterns_dir, @patterns_files_glob) @patterns = Hash.new { |h,k| h[k] = [] } @logger.debug("Match data", :match => @match) @metric_match_fields = metric.namespace(:patterns_per_field) @match.each do |field, patterns| patterns = [patterns] if patterns.is_a?(String) @metric_match_fields.gauge(field, patterns.length) @logger.trace("Grok compile", :field => field, :patterns => patterns) patterns.each do |pattern| @logger.debug? and @logger.debug("regexp: #{@type}/#{field}", :pattern => pattern) grok = Grok.new grok.logger = @logger unless @logger.nil? add_patterns_from_files(@patternfiles, grok) add_patterns_from_inline_definition(@pattern_definitions, grok) grok.compile(pattern, @named_captures_only) @patterns[field] << grok end end # @match.each end |