Class: Ferret::Analysis::AsciiStandardAnalyzer
- Inherits:
-
Object
- Object
- Ferret::Analysis::AsciiStandardAnalyzer
- Defined in:
- ext/r_analysis.c
Overview
Summary
The AsciiStandardAnalyzer is the most advanced of the available ASCII-analyzers. If it were implemented in Ruby it would look like this;
class AsciiStandardAnalyzer
def initialize(stop_words = FULL_ENGLISH_STOP_WORDS, lower = true)
@lower = lower
@stop_words = stop_words
end
def token_stream(field, str)
ts = AsciiStandardTokenizer.new(str)
ts = AsciiLowerCaseFilter.new(ts) if @lower
ts = StopFilter.new(ts, @stop_words)
ts = HyphenFilter.new(ts)
end
end
As you can see it makes use of the AsciiStandardTokenizer and you can also add your own list of stop-words if you wish. Note that this tokenizer won’t recognize non-ASCII characters so you should use the StandardAnalyzer is you want to analyze multi-byte data like “UTF-8”.