Class: String

Inherits:
Object show all
Defined in:
lib/classifier/lsi/summary.rb,
lib/classifier/extensions/word_hash.rb

Overview

These are extensions to the String class to provide convenience methods for the Classifier package.

Instance Method Summary collapse

Instance Method Details

#clean_word_hashObject

Return a word hash without extra punctuation or short symbols, just stemmed words



27
28
29
# File 'lib/classifier/extensions/word_hash.rb', line 27

def clean_word_hash
  word_hash_for_words gsub(/[^\w\s]/, '').split
end

#paragraph_summary(count = 1, separator = ' [...] ') ⇒ Object



10
11
12
# File 'lib/classifier/lsi/summary.rb', line 10

def paragraph_summary(count = 1, separator = ' [...] ')
  perform_lsi split_paragraphs, count, separator
end

#split_paragraphsObject



18
19
20
# File 'lib/classifier/lsi/summary.rb', line 18

def split_paragraphs
  split(/(\n\n|\r\r|\r\n\r\n)/) # TODO: make this less primitive
end

#split_sentencesObject



14
15
16
# File 'lib/classifier/lsi/summary.rb', line 14

def split_sentences
  split(/(\.|!|\?)/) # TODO: make this less primitive
end

#summary(count = 10, separator = ' [...] ') ⇒ Object



6
7
8
# File 'lib/classifier/lsi/summary.rb', line 6

def summary(count = 10, separator = ' [...] ')
  perform_lsi split_sentences, count, separator
end

#without_punctuationObject

Removes common punctuation symbols, returning a new string. E.g.,

"Hello (greeting's), with {braces} < >...?".without_punctuation
=> "Hello  greetings   with  braces         "


14
15
16
# File 'lib/classifier/extensions/word_hash.rb', line 14

def without_punctuation
  tr(',?.!;:"@#$%^&*()_=+[]{}\|<>/`~', ' ').tr("'\-", '')
end

#word_hashObject

Return a Hash of strings => ints. Each word in the string is stemmed, interned, and indexes to its frequency in the document.



20
21
22
23
24
# File 'lib/classifier/extensions/word_hash.rb', line 20

def word_hash
  word_hash = clean_word_hash
  symbol_hash = word_hash_for_symbols(gsub(/\w/, ' ').split)
  word_hash.merge(symbol_hash)
end