Class: VSS::Tokenizer
- Inherits:
-
Object
- Object
- VSS::Tokenizer
- Defined in:
- lib/vss/tokenizer.rb
Constant Summary collapse
- STOP_WORDS =
%w[ a b c d e f g h i j k l m n o p q r s t u v w x y z an and are as at be by for from has he in is it its of on that the to was were will with upon without among ]
Class Method Summary collapse
Class Method Details
.tokenize(string) ⇒ Object
12 13 14 15 16 |
# File 'lib/vss/tokenizer.rb', line 12 def self.tokenize(string) stripped = string.to_s.gsub(/[^a-z0-9\-\s\']/i, "") # remove punctuation words = stripped.split(/\s+/).reject(&:blank?).map(&:downcase).map(&:stem) words.reject { |word| STOP_WORDS.include?(word) }.uniq end |