Class: LiterateRandomizer::SourceParser

Inherits:
Object
  • Object
show all
Defined in:
lib/literate_randomizer/source_parser.rb

Overview

Parse the source material and provide “each_sentence” - an easy way to walk the source material.

Instance Method Summary collapse

Constructor Details

#initialize(options) ⇒ SourceParser

Options:

  • :source_material => string OR

  • :source_material_file => filename



13
14
15
# File 'lib/literate_randomizer/source_parser.rb', line 13

def initialize(options)
  @init_options = options
end

Instance Method Details

#default_source_materialObject

read the default source material included with the gem



18
19
20
# File 'lib/literate_randomizer/source_parser.rb', line 18

def default_source_material
  File.expand_path File.join(File.dirname(__FILE__),"..","..","data","the_lost_world_by_arthur_conan_doyle.txt")
end

#each_sentenceObject

Yields to a block each sentence as an array of words



49
50
51
52
53
# File 'lib/literate_randomizer/source_parser.rb', line 49

def each_sentence
  source_sentences.each do |sentence|
    yield scrub_sentence sentence
  end
end

#scrub_sentence(sentence) ⇒ Object

clean up all words in a string, returning an array of clean words



44
45
46
# File 'lib/literate_randomizer/source_parser.rb', line 44

def scrub_sentence(sentence)
  sentence.split(/([\s]|--)+/).collect {|a| scrub_word(a)}.select {|a| a.length>0}
end

#scrub_word(word) ⇒ Object

remove any non-alpha characters from word



37
38
39
40
41
# File 'lib/literate_randomizer/source_parser.rb', line 37

def scrub_word(word)
  word &&= word[/[A-Za-z][A-Za-z'-]*/]
  word &&= word[/[A-Za-z'-]*[A-Za-z]/]
  (word && word.strip) || ""
end

#source_material(options = init_options) ⇒ Object

Options:

:source_material => string
:source_material_file => filename


26
27
28
# File 'lib/literate_randomizer/source_parser.rb', line 26

def source_material(options=init_options)
  options[:source_material] || File.read(options[:source_material_file] || default_source_material)
end

#source_sentencesObject

Read the source material and split it into sentences NOTE: this re-reads the source material each time. Usually this only needs to happen once and it would waste memory to keep it around.



32
33
34
# File 'lib/literate_randomizer/source_parser.rb', line 32

def source_sentences
  source_material.split(/([.?!"]($|\s)|\n\s*\n)+/)
end