Module: FeldtRuby::Statistics

Included in:: FeldtRuby

Defined in:: lib/feldtruby/statistics/time_series/sax.rb,
lib/feldtruby/statistics.rb,
lib/feldtruby/statistics/design_of_experiments.rb,
lib/feldtruby/statistics/distance/string_distance.rb

Overview

Implements the basic SAX (Symbolic Adaptive approXimation) from the paper:

Jessica Lin, Eamonn Keogh, Stefano Lonardi, Bill Chiu, 
"A Symbolic Representation of Time Series, with Implications for Streaming Algorithms", IDMKD 2003.

available from: www.cs.ucr.edu/~eamonn/SAX.pdf

Defined Under Namespace

Modules: DesignOfExperiments, Plotting Classes: CompressionBasedDissimilarityMeasure, DiffusionKDE, NormalizedCompressionDistance, SAX, StringDistance

Instance Method Summary collapse

#cdm(string1, string2) ⇒ Object
#chi_squared_test(aryOrHashOfCounts) ⇒ Object
#correlation(ary1, ary2) ⇒ Object
#density_estimation(values, n = 2**9, min = nil, max = nil) ⇒ Object

Do a kernel density estimation based on the sampled values, with n bins (rounded up to nearest exponent of 2) and optional min and max values.
#ncd(string1, string2) ⇒ Object
#probability_of_same_proportions(aryOrHashOfCounts) ⇒ Object

Calc the probability that the unique values in array (or hash of counts of the values) have (statistically) equal proportions.

Instance Method Details

#cdm(string1, string2) ⇒ `Object`



45
46
47

# File 'lib/feldtruby/statistics/distance/string_distance.rb', line 45

def cdm(string1, string2)
  (@cdm ||= CompressionBasedDissimilarityMeasure.new).distance(string1, string2)
end

#chi_squared_test(aryOrHashOfCounts) ⇒ `Object`

# File 'lib/feldtruby/statistics.rb', line 181

def chi_squared_test(aryOrHashOfCounts)
  puts "aryOrHashOfCounts = #{aryOrHashOfCounts}"
  counts = (Hash === aryOrHashOfCounts) ? aryOrHashOfCounts : aryOrHashOfCounts.counts
  vs = counts.values
  res = RC.call("chisq.test", vs)
  res.p_value
end

#correlation(ary1, ary2) ⇒ `Object`



189
190
191

# File 'lib/feldtruby/statistics.rb', line 189

def correlation(ary1, ary2)
  RC.call("cor", ary1, ary2)
end

#density_estimation(values, n = 2**9, min = nil, max = nil) ⇒ `Object`

Do a kernel density estimation based on the sampled values, with n bins (rounded up to nearest exponent of 2) and optional min and max values.

# File 'lib/feldtruby/statistics.rb', line 221

def density_estimation(values, n = 2**9, min = nil, max = nil)
  # Ensure we have loaded the diffusion.kde code
  RC.load_feldtruby_r_script("diffusion_kde.R")
  args = [values, n]
  if min && max
    args << min
    args << max
  end
  DiffusionKDE.new RC.call("diffusion.kde", *args)
end

#ncd(string1, string2) ⇒ `Object`



30
31
32

# File 'lib/feldtruby/statistics/distance/string_distance.rb', line 30

def ncd(string1, string2)
  (@ncd ||= NormalizedCompressionDistance.new).distance(string1, string2)
end

#probability_of_same_proportions(aryOrHashOfCounts) ⇒ `Object`

Calc the probability that the unique values in array (or hash of counts of the values) have (statistically) equal proportions.

# File 'lib/feldtruby/statistics.rb', line 174

def probability_of_same_proportions(aryOrHashOfCounts)
  counts = (Hash === aryOrHashOfCounts) ? aryOrHashOfCounts : aryOrHashOfCounts.counts
  vs = counts.values
  res = RC.call("prop.test", vs, ([vs.sum] * vs.length))
  res.p_value
end

Module: FeldtRuby::Statistics

Overview

Defined Under Namespace

Instance Method Summary collapse

Instance Method Details

#cdm(string1, string2) ⇒ Object

#chi_squared_test(aryOrHashOfCounts) ⇒ Object

#correlation(ary1, ary2) ⇒ Object

#density_estimation(values, n = 2**9, min = nil, max = nil) ⇒ Object

#ncd(string1, string2) ⇒ Object