Class: String
Overview
These are extensions to the String class to provide convenience methods for the Classifier package.
Instance Method Summary collapse
-
#clean_word_hash ⇒ Object
Return a word hash without extra punctuation or short symbols, just stemmed words.
-
#without_punctuation ⇒ Object
Removes common punctuation symbols, returning a new string.
-
#word_hash ⇒ Object
Return a Hash of strings => ints.
Instance Method Details
#clean_word_hash ⇒ Object
Return a word hash without extra punctuation or short symbols, just stemmed words
24 25 26 |
# File 'lib/simple_classifier/extensions/word_hash.rb', line 24 def clean_word_hash word_hash_for_words gsub(/[^\w\s]/,"").split end |
#without_punctuation ⇒ Object
Removes common punctuation symbols, returning a new string. E.g.,
"Hello (greeting's), with {braces} < >...?".without_punctuation
=> "Hello greetings with braces "
13 14 15 |
# File 'lib/simple_classifier/extensions/word_hash.rb', line 13 def without_punctuation tr( ',?.!;:"@#$%^&*()_=+[]{}\|<>/`~', " " ) .tr( "'\-", "") end |
#word_hash ⇒ Object
Return a Hash of strings => ints. Each word in the string is stemmed, interned, and indexes to its frequency in the document.
19 20 21 |
# File 'lib/simple_classifier/extensions/word_hash.rb', line 19 def word_hash word_hash_for_words(gsub(/[^\w\s]/,"").split + gsub(/[\w]/," ").split) end |