Class: String

Inherits:
Object
  • Object
show all
Defined in:
lib/chomchom/string.rb

Instance Method Summary collapse

Instance Method Details

#is_common?Boolean

common dictionary built from google’s top 300 1-grams hand-removed some sunch as english, god, american, united, states, john

Returns:

  • (Boolean)


40
41
42
43
# File 'lib/chomchom/string.rb', line 40

def is_common?
  common = " the of and to in a is that was for as with be by it his which on i not he or are from at this have had but were an their they all one been we has you who so more will her him them would its no may other there when than into any only time if some can these my such out two our very up should she me made about upon what most said could also do must then those great same being after man much many now over before well between where like under us through own life men even your work did see good without people part t little day shall each found new every make long mr might three against place both because himself down never used while still too how old case given however use another world know de called right take here last general whole though water country number state large come say form year less few far order does came during small again just back among yet give hand left different having thought always fact end high go per taken often within p things course certain others off cannot means think find above therefore side since am ever known themselves once set thus seen following nothing until whom house four away second itself whose put possible either rather several best took went done d almost subject words become true head necessary young better get common whether half cases brought least nor early five later full thing already together "
  common.include?(" #{self.downcase} ")
end

#limit(length) ⇒ Object

constraint a string to a fixed length or less discard everything after the last punctuation that occurs right before lenght limit the regexp look ahead for any punctuation



34
35
36
# File 'lib/chomchom/string.rb', line 34

def limit(length)
  (self.length > length)? self[0...length].gsub(/(?![\s\S]+?[,:;)\/\\\|])([,:;)\/\\\|].*)/,'') : self
end

#split_sentencesObject

split text into sentences, take into account Mr.|Ms. endings are not end of sentence



5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
# File 'lib/chomchom/string.rb', line 5

def split_sentences
  #break text first by paragraph then into chunks delimited by a period
  #but these are not quite sentences yet
  chunks = (self.split(/\n+/).map { |p| "#{p}\n".split(/[!?]+|(?:\.+(?:[^\p{Word}]))/) }).flatten.compact

  #if a sentence is split at Mr.|Ms.|Dr.|Mrs. 
  #then recombine it with its remaining part and nil it to delete later
  tmp=''
  sentences = chunks.map { |c|
    ss = (tmp != '')? "#{tmp}. #{c}" : c
    if c.match(/(?:Dr|Mr|Ms|Mrs)$/) 
      #what about John F. Kennedy ([A-Z])
      #I finish at 5 a.m. today.
      #At 5 p.m. I have to go to the bank 
      #rule 1: every sentence starts with a Cap (what about iPhone?)
      #just check if a sentence is too short then combine with the previous or next?
      tmp = ss
      ss=nil
    else
      tmp = ''
    end
    ss
  } 
  sentences.compact #delete nil elements
end