Class: FeldtRuby::Statistics::SAX

Inherits:

Object

Object
FeldtRuby::Statistics::SAX

show all

Defined in:: lib/feldtruby/statistics/time_series/sax.rb

Overview

A SAX processor transforms any numeric stream of data (often a time series) of arbitrary length n to a string (symbolic stream) of arbitrary length w, where w<n, and typically w<<n. The alphabet size (symbols in the string) is also an arbitrary integer a, a>2. Compared to the SAX described by Keogh et al we state the number of data elements, elementsPerWord, that should go into each word, i.e. w = n/elementsPerWord. This allows for many powerful data mining algorithms to be applied and sped up.

Defined Under Namespace

Classes: SymbolMapper

Instance Method Summary collapse

#initialize(elementsPerWord, alphabetSize = 6) ⇒ SAX constructor

Create a SAX processor with given output length w and alphabet size a.
#process(data, windowSize = data.length, mapper = nil) ⇒ Object
#process_subsequence(subsequence) ⇒ Object
#setup_for_processing_data(data, mapper = nil) ⇒ Object

Constructor Details

#initialize(elementsPerWord, alphabetSize = 6) ⇒ `SAX`

Create a SAX processor with given output length w and alphabet size a.

Raises:

(ArgumentError)

# File 'lib/feldtruby/statistics/time_series/sax.rb', line 18

def initialize(elementsPerWord, alphabetSize = 6)
  raise ArgumentError if alphabetSize > 20 || alphabetSize < 2
  @elements_per_word, @alphabet_size = elementsPerWord, alphabetSize
end

Instance Method Details

#process(data, windowSize = data.length, mapper = nil) ⇒ `Object`

# File 'lib/feldtruby/statistics/time_series/sax.rb', line 89

def process(data, windowSize = data.length, mapper = nil)
  setup_for_processing_data(data, mapper)
  res = (0..(data.length - windowSize)).map do |i|
    process_subsequence(data[i, windowSize])
  end
  res = res.flatten if windowSize == data.length
  res
end

#process_subsequence(subsequence) ⇒ `Object`

# File 'lib/feldtruby/statistics/time_series/sax.rb', line 77

def process_subsequence(subsequence)
  normalized_ss = subsequence.z_normalize
  len, rem = normalized_ss.length.divmod @elements_per_word
  # Note that if the lengths are not evenly divisible the last word will be based on fewer elements. 
  # This is different than the orig SAX as specified in their paper.
  symbols = (0...len).map do |wordindex|
    @mapper.map_sequence_to_symbol(normalized_ss[wordindex * @elements_per_word, @elements_per_word], @alphabet_size)
  end
  symbols << @mapper.map_sequence_to_symbol(normalized_ss[len, @elements_per_word], @alphabet_size) if rem > 0
  symbols
end

#setup_for_processing_data(data, mapper = nil) ⇒ `Object`

# File 'lib/feldtruby/statistics/time_series/sax.rb', line 70

def setup_for_processing_data(data, mapper = nil)
  @mapper ||= SymbolMapper.new(data)
  unless @mapper.supports_alphabet_size?(@alphabet_size)
    raise ArgumentError.new("Mapper does not support the alphabet size (#{@alphabet_size}): #{@mapper}")
  end
end