Class: String
- Defined in:
- lib/classifier/lsi/summary.rb,
lib/classifier/extensions/word_hash.rb
Overview
These are extensions to the String class to provide convenience methods for the Classifier package.
Instance Method Summary collapse
-
#clean_word_hash ⇒ Object
Return a word hash without extra punctuation or short symbols, just stemmed words.
- #paragraph_summary(count = 1, separator = " [...] ") ⇒ Object
- #split_paragraphs ⇒ Object
- #split_sentences ⇒ Object
- #summary(count = 10, separator = " [...] ") ⇒ Object
-
#without_punctuation ⇒ Object
Removes common punctuation symbols, returning a new string.
-
#word_hash ⇒ Object
Return a Hash of strings => ints.
Instance Method Details
#clean_word_hash ⇒ Object
Return a word hash without extra punctuation or short symbols, just stemmed words
24 25 26 |
# File 'lib/classifier/extensions/word_hash.rb', line 24 def clean_word_hash word_hash_for_words gsub(/[^\w\s]/,"").split end |
#paragraph_summary(count = 1, separator = " [...] ") ⇒ Object
10 11 12 |
# File 'lib/classifier/lsi/summary.rb', line 10 def paragraph_summary( count=1, separator=" [...] " ) perform_lsi split_paragraphs, count, separator end |
#split_paragraphs ⇒ Object
18 19 20 |
# File 'lib/classifier/lsi/summary.rb', line 18 def split_paragraphs split /(\n\n|\r\r|\r\n\r\n)/ # TODO: make this less primitive end |
#split_sentences ⇒ Object
14 15 16 |
# File 'lib/classifier/lsi/summary.rb', line 14 def split_sentences split /(\.|\!|\?)/ # TODO: make this less primitive end |
#summary(count = 10, separator = " [...] ") ⇒ Object
6 7 8 |
# File 'lib/classifier/lsi/summary.rb', line 6 def summary( count=10, separator=" [...] " ) perform_lsi split_sentences, count, separator end |
#without_punctuation ⇒ Object
Removes common punctuation symbols, returning a new string. E.g.,
"Hello (greeting's), with {braces} < >...?".without_punctuation
=> "Hello greetings with braces "
13 14 15 |
# File 'lib/classifier/extensions/word_hash.rb', line 13 def without_punctuation tr( ',?.!;:"@#$%^&*()_=+[]{}\|<>/`~', " " ) .tr( "'\-", "") end |
#word_hash ⇒ Object
Return a Hash of strings => ints. Each word in the string is stemmed, interned, and indexes to its frequency in the document.
19 20 21 |
# File 'lib/classifier/extensions/word_hash.rb', line 19 def word_hash word_hash_for_words(gsub(/[^\w\s]/,"").split + gsub(/[\w]/," ").split) end |