Module: ClassifierReborn::TokenFilter::Stopword
- Defined in:
- lib/classifier-reborn/extensions/token_filter/stopword.rb
Overview
This filter removes stopwords in the language, from given tokens.
Constant Summary collapse
- STOPWORDS_PATH =
[File.(File.dirname(__FILE__) + '/../../../../data/stopwords')]
- STOPWORDS =
Create a lazily-loaded hash of stopword data
Hash.new do |hash, language| hash[language] = [] STOPWORDS_PATH.each do |path| if File.exist?(File.join(path, language)) hash[language] = Set.new File.read(File.join(path, language.to_s)).force_encoding('utf-8').split break end end hash[language] end
Class Method Summary collapse
-
.add_custom_stopword_path(path) ⇒ Object
Add custom path to a new stopword file created by user.
- .call(tokens) ⇒ Object
-
.language=(language) ⇒ Object
Changes the language of stopwords.
Class Method Details
.add_custom_stopword_path(path) ⇒ Object
Add custom path to a new stopword file created by user
24 25 26 |
# File 'lib/classifier-reborn/extensions/token_filter/stopword.rb', line 24 def add_custom_stopword_path(path) STOPWORDS_PATH.unshift(path) end |
.call(tokens) ⇒ Object
16 17 18 19 20 21 |
# File 'lib/classifier-reborn/extensions/token_filter/stopword.rb', line 16 def call(tokens) tokens.reject do |token| token.maybe_stopword? && (token.length <= 2 || STOPWORDS[@language].include?(token)) end end |
.language=(language) ⇒ Object
Changes the language of stopwords
43 44 45 |
# File 'lib/classifier-reborn/extensions/token_filter/stopword.rb', line 43 def language=(language) @language = language end |