Class: LiterateRandomizer::SourceParser
- Inherits:
-
Object
- Object
- LiterateRandomizer::SourceParser
- Defined in:
- lib/literate_randomizer/source_parser.rb
Overview
Parse the source material and provide “each_sentence” - an easy way to walk the source material.
Instance Method Summary collapse
-
#default_source_material ⇒ Object
read the default source material included with the gem.
-
#each_sentence ⇒ Object
Yields to a block each sentence as an array of words.
-
#initialize(options) ⇒ SourceParser
constructor
Options:.
-
#scrub_sentence(sentence) ⇒ Object
clean up all words in a string, returning an array of clean words.
-
#scrub_word(word) ⇒ Object
remove any non-alpha characters from word.
-
#source_material(options = init_options) ⇒ Object
Options:.
-
#source_sentences ⇒ Object
Read the source material and split it into sentences NOTE: this re-reads the source material each time.
Constructor Details
#initialize(options) ⇒ SourceParser
Options:
-
:source_material => string OR
-
:source_material_file => filename
13 14 15 |
# File 'lib/literate_randomizer/source_parser.rb', line 13 def initialize() @init_options = end |
Instance Method Details
#default_source_material ⇒ Object
read the default source material included with the gem
18 19 20 |
# File 'lib/literate_randomizer/source_parser.rb', line 18 def default_source_material File. File.join(File.dirname(__FILE__),"..","..","data","the_lost_world_by_arthur_conan_doyle.txt") end |
#each_sentence ⇒ Object
Yields to a block each sentence as an array of words
49 50 51 52 53 |
# File 'lib/literate_randomizer/source_parser.rb', line 49 def each_sentence source_sentences.each do |sentence| yield scrub_sentence sentence end end |
#scrub_sentence(sentence) ⇒ Object
clean up all words in a string, returning an array of clean words
44 45 46 |
# File 'lib/literate_randomizer/source_parser.rb', line 44 def scrub_sentence(sentence) sentence.split(/([\s]|--)+/).collect {|a| scrub_word(a)}.select {|a| a.length>0} end |
#scrub_word(word) ⇒ Object
remove any non-alpha characters from word
37 38 39 40 41 |
# File 'lib/literate_randomizer/source_parser.rb', line 37 def scrub_word(word) word &&= word[/[A-Za-z][A-Za-z'-]*/] word &&= word[/[A-Za-z'-]*[A-Za-z]/] (word && word.strip) || "" end |
#source_material(options = init_options) ⇒ Object
Options:
:source_material => string
:source_material_file => filename
26 27 28 |
# File 'lib/literate_randomizer/source_parser.rb', line 26 def source_material(=) [:source_material] || File.read([:source_material_file] || default_source_material) end |
#source_sentences ⇒ Object
Read the source material and split it into sentences NOTE: this re-reads the source material each time. Usually this only needs to happen once and it would waste memory to keep it around.
32 33 34 |
# File 'lib/literate_randomizer/source_parser.rb', line 32 def source_sentences source_material.split(/([.?!"]($|\s)|\n\s*\n)+/) end |