Class: String
- Defined in:
- lib/classifier/lsi/summary.rb,
lib/classifier/extensions/word_hash.rb
Overview
These are extensions to the String class to provide convenience methods for the Classifier package.
Constant Summary collapse
- ABBREVIATIONS =
%w[Mr Mrs Ms Dr Prof Jr Sr Inc Ltd Corp Co vs etc al eg ie].freeze
Instance Method Summary collapse
-
#clean_word_hash ⇒ Object
Return a word hash without extra punctuation or short symbols, just stemmed words.
- #paragraph_summary(count = 1, separator = ' [...] ') ⇒ Object
- #split_paragraphs ⇒ Object
- #split_sentences ⇒ Object
- #summary(count = 10, separator = ' [...] ') ⇒ Object
-
#without_punctuation ⇒ Object
Removes common punctuation symbols, returning a new string.
-
#word_hash ⇒ Object
Return a Hash of strings => ints.
Instance Method Details
#clean_word_hash ⇒ Object
Return a word hash without extra punctuation or short symbols, just stemmed words
32 33 34 |
# File 'lib/classifier/extensions/word_hash.rb', line 32 def clean_word_hash word_hash_for_words gsub(/[^\w\s]/, '').split end |
#paragraph_summary(count = 1, separator = ' [...] ') ⇒ Object
12 13 14 |
# File 'lib/classifier/lsi/summary.rb', line 12 def paragraph_summary(count = 1, separator = ' [...] ') perform_lsi split_paragraphs, count, separator end |
#split_paragraphs ⇒ Object
22 23 24 |
# File 'lib/classifier/lsi/summary.rb', line 22 def split_paragraphs split(/\r?\n\r?\n+/) end |
#split_sentences ⇒ Object
16 17 18 19 20 |
# File 'lib/classifier/lsi/summary.rb', line 16 def split_sentences return pragmatic_segment if defined?(PragmaticSegmenter) split_sentences_regex end |
#summary(count = 10, separator = ' [...] ') ⇒ Object
8 9 10 |
# File 'lib/classifier/lsi/summary.rb', line 8 def summary(count = 10, separator = ' [...] ') perform_lsi split_sentences, count, separator end |
#without_punctuation ⇒ Object
Removes common punctuation symbols, returning a new string. E.g.,
"Hello (greeting's), with {braces} < >...?".without_punctuation
=> "Hello greetings with braces "
17 18 19 |
# File 'lib/classifier/extensions/word_hash.rb', line 17 def without_punctuation tr(',?.!;:"@#$%^&*()_=+[]{}|<>/`~', ' ').tr("'-", '') end |
#word_hash ⇒ Object
Return a Hash of strings => ints. Each word in the string is stemmed, interned, and indexes to its frequency in the document.
24 25 26 27 28 |
# File 'lib/classifier/extensions/word_hash.rb', line 24 def word_hash word_hash = clean_word_hash symbol_hash = word_hash_for_symbols(gsub(/\w/, ' ').split) word_hash.merge(symbol_hash) end |