Class: String
- Defined in:
- lib/classifier/lsi/summary.rb,
lib/classifier/extensions/word_hash.rb
Overview
These are extensions to the String class to provide convenience methods for the Classifier package.
Instance Method Summary collapse
-
#clean_word_hash ⇒ Object
Return a word hash without extra punctuation or short symbols, just stemmed words.
- #paragraph_summary(count = 1, separator = ' [...] ') ⇒ Object
- #split_paragraphs ⇒ Object
- #split_sentences ⇒ Object
- #summary(count = 10, separator = ' [...] ') ⇒ Object
-
#without_punctuation ⇒ Object
Removes common punctuation symbols, returning a new string.
-
#word_hash ⇒ Object
Return a Hash of strings => ints.
Instance Method Details
#clean_word_hash ⇒ Object
Return a word hash without extra punctuation or short symbols, just stemmed words
27 28 29 |
# File 'lib/classifier/extensions/word_hash.rb', line 27 def clean_word_hash word_hash_for_words gsub(/[^\w\s]/, '').split end |
#paragraph_summary(count = 1, separator = ' [...] ') ⇒ Object
10 11 12 |
# File 'lib/classifier/lsi/summary.rb', line 10 def paragraph_summary(count = 1, separator = ' [...] ') perform_lsi split_paragraphs, count, separator end |
#split_paragraphs ⇒ Object
18 19 20 |
# File 'lib/classifier/lsi/summary.rb', line 18 def split_paragraphs split(/(\n\n|\r\r|\r\n\r\n)/) # TODO: make this less primitive end |
#split_sentences ⇒ Object
14 15 16 |
# File 'lib/classifier/lsi/summary.rb', line 14 def split_sentences split(/(\.|!|\?)/) # TODO: make this less primitive end |
#summary(count = 10, separator = ' [...] ') ⇒ Object
6 7 8 |
# File 'lib/classifier/lsi/summary.rb', line 6 def summary(count = 10, separator = ' [...] ') perform_lsi split_sentences, count, separator end |
#without_punctuation ⇒ Object
Removes common punctuation symbols, returning a new string. E.g.,
"Hello (greeting's), with {braces} < >...?".without_punctuation
=> "Hello greetings with braces "
14 15 16 |
# File 'lib/classifier/extensions/word_hash.rb', line 14 def without_punctuation tr(',?.!;:"@#$%^&*()_=+[]{}\|<>/`~', ' ').tr("'\-", '') end |
#word_hash ⇒ Object
Return a Hash of strings => ints. Each word in the string is stemmed, interned, and indexes to its frequency in the document.
20 21 22 23 24 |
# File 'lib/classifier/extensions/word_hash.rb', line 20 def word_hash word_hash = clean_word_hash symbol_hash = word_hash_for_symbols(gsub(/\w/, ' ').split) word_hash.merge(symbol_hash) end |