Module: ClassifierReborn::Hasher
- Defined in:
- lib/classifier-reborn/extensions/hasher.rb
Class Method Summary collapse
-
.word_hash(str, enable_stemmer = true, tokenizer: Tokenizer::Whitespace, token_filters: [TokenFilter::Stopword]) ⇒ Object
Return a Hash of strings => ints.
Class Method Details
.word_hash(str, enable_stemmer = true, tokenizer: Tokenizer::Whitespace, token_filters: [TokenFilter::Stopword]) ⇒ Object
Return a Hash of strings => ints. Each word in the string is stemmed, interned, and indexes to its frequency in the document.
19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 |
# File 'lib/classifier-reborn/extensions/hasher.rb', line 19 def word_hash(str, enable_stemmer = true, tokenizer: Tokenizer::Whitespace, token_filters: [TokenFilter::Stopword]) if token_filters.include?(TokenFilter::Stemmer) unless enable_stemmer token_filters.reject! do |token_filter| token_filter == TokenFilter::Stemmer end end else token_filters << TokenFilter::Stemmer if enable_stemmer end words = tokenizer.call(str) token_filters.each do |token_filter| words = token_filter.call(words) end d = Hash.new(0) words.each do |word| d[word.intern] += 1 end d end |