Class: FeldtRuby::Statistics::SAX
- Defined in:
- lib/feldtruby/statistics/time_series/sax.rb
Overview
A SAX processor transforms any numeric stream of data (often a time series) of arbitrary length n to a string (symbolic stream) of arbitrary length w, where w<n, and typically w<<n. The alphabet size (symbols in the string) is also an arbitrary integer a, a>2. Compared to the SAX described by Keogh et al we state the number of data elements, elementsPerWord, that should go into each word, i.e. w = n/elementsPerWord. This allows for many powerful data mining algorithms to be applied and sped up.
Defined Under Namespace
Classes: SymbolMapper
Instance Method Summary collapse
-
#initialize(elementsPerWord, alphabetSize = 6) ⇒ SAX
constructor
Create a SAX processor with given output length w and alphabet size a.
- #process(data, windowSize = data.length, mapper = nil) ⇒ Object
- #process_subsequence(subsequence) ⇒ Object
- #setup_for_processing_data(data, mapper = nil) ⇒ Object
Constructor Details
#initialize(elementsPerWord, alphabetSize = 6) ⇒ SAX
Create a SAX processor with given output length w and alphabet size a.
18 19 20 21 |
# File 'lib/feldtruby/statistics/time_series/sax.rb', line 18 def initialize(elementsPerWord, alphabetSize = 6) raise ArgumentError if alphabetSize > 20 || alphabetSize < 2 @elements_per_word, @alphabet_size = elementsPerWord, alphabetSize end |
Instance Method Details
#process(data, windowSize = data.length, mapper = nil) ⇒ Object
89 90 91 92 93 94 95 96 |
# File 'lib/feldtruby/statistics/time_series/sax.rb', line 89 def process(data, windowSize = data.length, mapper = nil) setup_for_processing_data(data, mapper) res = (0..(data.length - windowSize)).map do |i| process_subsequence(data[i, windowSize]) end res = res.flatten if windowSize == data.length res end |
#process_subsequence(subsequence) ⇒ Object
77 78 79 80 81 82 83 84 85 86 87 |
# File 'lib/feldtruby/statistics/time_series/sax.rb', line 77 def process_subsequence(subsequence) normalized_ss = subsequence.z_normalize len, rem = normalized_ss.length.divmod @elements_per_word # Note that if the lengths are not evenly divisible the last word will be based on fewer elements. # This is different than the orig SAX as specified in their paper. symbols = (0...len).map do |wordindex| @mapper.map_sequence_to_symbol(normalized_ss[wordindex * @elements_per_word, @elements_per_word], @alphabet_size) end symbols << @mapper.map_sequence_to_symbol(normalized_ss[len, @elements_per_word], @alphabet_size) if rem > 0 symbols end |
#setup_for_processing_data(data, mapper = nil) ⇒ Object
70 71 72 73 74 75 |
# File 'lib/feldtruby/statistics/time_series/sax.rb', line 70 def setup_for_processing_data(data, mapper = nil) @mapper ||= SymbolMapper.new(data) unless @mapper.supports_alphabet_size?(@alphabet_size) raise ArgumentError.new("Mapper does not support the alphabet size (#{@alphabet_size}): #{@mapper}") end end |