Class: String
- Inherits:
-
Object
- Object
- String
- Defined in:
- lib/chomchom/string.rb
Instance Method Summary collapse
-
#is_common? ⇒ Boolean
common dictionary built from google’s top 300 1-grams hand-removed some sunch as english, god, american, united, states, john.
-
#limit(length) ⇒ Object
constraint a string to a fixed length or less discard everything after the last punctuation that occurs right before lenght limit the regexp look ahead for any punctuation.
-
#split_sentences ⇒ Object
split text into sentences, take into account Mr.|Ms.
Instance Method Details
#is_common? ⇒ Boolean
common dictionary built from google’s top 300 1-grams hand-removed some sunch as english, god, american, united, states, john
40 41 42 43 |
# File 'lib/chomchom/string.rb', line 40 def is_common? common = " the of and to in a is that was for as with be by it his which on i not he or are from at this have had but were an their they all one been we has you who so more will her him them would its no may other there when than into any only time if some can these my such out two our very up should she me made about upon what most said could also do must then those great same being after man much many now over before well between where like under us through own life men even your work did see good without people part t little day shall each found new every make long mr might three against place both because himself down never used while still too how old case given however use another world know de called right take here last general whole though water country number state large come say form year less few far order does came during small again just back among yet give hand left different having thought always fact end high go per taken often within p things course certain others off cannot means think find above therefore side since am ever known themselves once set thus seen following nothing until whom house four away second itself whose put possible either rather several best took went done d almost subject words become true head necessary young better get common whether half cases brought least nor early five later full thing already together " common.include?(" #{self.downcase} ") end |
#limit(length) ⇒ Object
constraint a string to a fixed length or less discard everything after the last punctuation that occurs right before lenght limit the regexp look ahead for any punctuation
34 35 36 |
# File 'lib/chomchom/string.rb', line 34 def limit(length) (self.length > length)? self[0...length].gsub(/(?![\s\S]+?[,:;)\/\\\|])([,:;)\/\\\|].*)/,'') : self end |
#split_sentences ⇒ Object
split text into sentences, take into account Mr.|Ms. endings are not end of sentence
5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 |
# File 'lib/chomchom/string.rb', line 5 def split_sentences #break text first by paragraph then into chunks delimited by a period #but these are not quite sentences yet chunks = (self.split(/\n+/).map { |p| "#{p}\n".split(/[!?]+|(?:\.+(?:[^\p{Word}]))/) }).flatten.compact #if a sentence is split at Mr.|Ms.|Dr.|Mrs. #then recombine it with its remaining part and nil it to delete later tmp='' sentences = chunks.map { |c| ss = (tmp != '')? "#{tmp}. #{c}" : c if c.match(/(?:Dr|Mr|Ms|Mrs)$/) #what about John F. Kennedy ([A-Z]) #I finish at 5 a.m. today. #At 5 p.m. I have to go to the bank #rule 1: every sentence starts with a Cap (what about iPhone?) #just check if a sentence is too short then combine with the previous or next? tmp = ss ss=nil else tmp = '' end ss } sentences.compact #delete nil elements end |