Class: MiniSearch::Pipeline

Inherits:
Object
  • Object
show all
Defined in:
lib/mini_search/pipeline.rb

Overview

All the transformations and normalizations we need to do when indexing a document or searching

Instance Method Summary collapse

Constructor Details

#initialize(tokenizer, filters) ⇒ Pipeline

Returns a new instance of Pipeline.



7
8
9
10
11
# File 'lib/mini_search/pipeline.rb', line 7

def initialize(tokenizer, filters)
  @standard_tokenizer = MiniSearch::StandardWhitespaceTokenizer.new
  @tokenizer = tokenizer
  @filters = filters
end

Instance Method Details

#execute(string) ⇒ Object



13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
# File 'lib/mini_search/pipeline.rb', line 13

def execute(string)
  # Since the filter model expects tokens that are tokenized by
  # the standard tokenizer, let's use that first.
  tokens = @standard_tokenizer.execute(string)

  # Apply filters
  filters_applied = @filters.reduce(tokens) do |filtered_tokens, filter|
    filter.execute(filtered_tokens)
  end

  # Return if our selected tokenizer is the standard tokenizer
  return filters_applied if @tokenizer.is_a? MiniSearch::StandardWhitespaceTokenizer

  # Execute non-standard tokenization after rejoining the tokens
  # that were tokenized with the StandardWhitespaceTokenizer
  @tokenizer.execute(filters_applied.join(' '))
end