Class: MoreMath::Sequence

Inherits:

Object

Object
MoreMath::Sequence

show all

Includes:: Enumerable, MovingAverage

Defined in:: lib/more_math/sequence.rb,
lib/more_math/sequence/moving_average.rb

Overview

This class is used to contain elements and compute various statistical values for them.

Defined Under Namespace

Modules: MovingAverage, Refinement

Instance Attribute Summary collapse

#elements ⇒ Object readonly

Returns the array of elements.

Instance Method Summary collapse

#autocorrelation ⇒ Object

Returns the array of autocorrelation values c_k / c_0 (of length size - 1).
#autovariance ⇒ Object

Returns the array of autovariances (of length size - 1).
#common_standard_deviation(other) ⇒ Object

Returns an estimation of the common standard deviation of the elements of this and other.
#common_variance(other) ⇒ Object

Returns an estimation of the common variance of the elements of this and other.
#compute_student_df(other) ⇒ Object

Compute the # degrees of freedom for Student’s t-test.
#compute_welch_df(other) ⇒ Object

Use an approximation of the Welch-Satterthwaite equation to compute the degrees of freedom for Welch’s t-test.
#confidence_interval(alpha = 0.05) ⇒ Object

Return the confidence interval for the arithmetic mean with alpha level alpha of the elements of this Sequence instance as a Range object.
#cover?(other, alpha = 0.05) ⇒ Boolean

Return true, if the Sequence instance covers the other, that is their arithmetic mean value is most likely to be equal for the alpha error level.
#detect_autocorrelation(lags = 20, alpha_level = 0.05) ⇒ Object

This method tries to detect autocorrelation with the Ljung-Box statistic.
#detect_outliers(factor = 3.0, epsilon = 1E-5) ⇒ Object

Return a result hash with the number of :very_low, :low, :high, and :very_high outliers, determined by the box plotting algorithm run with :median and :iqr parameters.
#durbin_watson_statistic ⇒ Object

Returns the d-value for the Durbin-Watson statistic.
#each(&block) ⇒ Object

Calls the block for every element of this Sequence.
#empty? ⇒ Boolean

Returns true if this sequence is empty, otherwise false.
#histogram(bins) ⇒ Object

Returns a Histogram instance with bins as the number of bins for this analysis’ elements.
#initialize(elements) ⇒ Sequence constructor

A new instance of Sequence.
#ljung_box_statistic(lags = 20) ⇒ Object

Returns the q value of the Ljung-Box statistic for the number of lags lags.
#percentile(p = 50) ⇒ Object (also: #median)

Returns the p-percentile of the elements.
#push(element) ⇒ Object (also: #<<)

Push element on this Sequence and return a new Sequence instance with element as its last element.
#reset ⇒ Object

Reset all memoized values of this sequence.
#size ⇒ Object

Returns the number of elements, on which the analysis is based.
#suggested_sample_size(other, alpha = 0.05, beta = 0.05) ⇒ Object

Compute a sample size, that will more likely yield a mean difference between this instance’s elements and those of other.
#t_student(other) ⇒ Object

Returns the t value of the Student’s t-test between this Sequence instance and the other.
#t_welch(other) ⇒ Object

Returns the t value of the Welch’s t-test between this Sequence instance and the other.
#to_ary ⇒ Object (also: #to_a)

Methods included from MovingAverage

#simple_moving_average

Constructor Details

#initialize(elements) ⇒ `Sequence`

Returns a new instance of Sequence.



10
11
12

# File 'lib/more_math/sequence.rb', line 10

def initialize(elements)
  @elements = elements.dup.freeze
end

Instance Attribute Details

#elements ⇒ `Object` (readonly)

Returns the array of elements.



15
16
17

# File 'lib/more_math/sequence.rb', line 15

def elements
  @elements
end

Instance Method Details

#autocorrelation ⇒ `Object`

Returns the array of autocorrelation values c_k / c_0 (of length size - 1).

# File 'lib/more_math/sequence.rb', line 290

def autocorrelation
  c = autovariance
  Array.new(c.size) { |k| c[k] / c[0] }
end

#autovariance ⇒ `Object`

Returns the array of autovariances (of length size - 1).

# File 'lib/more_math/sequence.rb', line 278

def autovariance
  Array.new(size - 1) do |k|
    s = 0.0
    0.upto(size - k - 1) do |i|
      s += (@elements[i] - arithmetic_mean) * (@elements[i + k] - arithmetic_mean)
    end
    s / size
  end
end

#common_standard_deviation(other) ⇒ `Object`

Returns an estimation of the common standard deviation of the elements of this and other.



219
220
221

# File 'lib/more_math/sequence.rb', line 219

def common_standard_deviation(other)
  Math.sqrt(common_variance(other))
end

#common_variance(other) ⇒ `Object`

Returns an estimation of the common variance of the elements of this and other.

# File 'lib/more_math/sequence.rb', line 225

def common_variance(other)
  (size - 1) * sample_variance + (other.size - 1) *
    other.sample_variance / (size + other.size - 2)
end

#compute_student_df(other) ⇒ `Object`

Compute the # degrees of freedom for Student’s t-test.



231
232
233

# File 'lib/more_math/sequence.rb', line 231

def compute_student_df(other)
  size + other.size - 2
end

#compute_welch_df(other) ⇒ `Object`

Use an approximation of the Welch-Satterthwaite equation to compute the degrees of freedom for Welch’s t-test.

# File 'lib/more_math/sequence.rb', line 200

def compute_welch_df(other)
  (sample_variance / size + other.sample_variance / other.size) ** 2 / (
    (sample_variance ** 2 / (size ** 2 * (size - 1))) +
    (other.sample_variance ** 2 / (other.size ** 2 * (other.size - 1))))
end

#confidence_interval(alpha = 0.05) ⇒ `Object`

Return the confidence interval for the arithmetic mean with alpha level alpha of the elements of this Sequence instance as a Range object.

# File 'lib/more_math/sequence.rb', line 270

def confidence_interval(alpha = 0.05)
  td = TDistribution.new(size - 1)
  t = td.inverse_probability(alpha / 2).abs
  delta = t * sample_standard_deviation / Math.sqrt(size)
  (arithmetic_mean - delta)..(arithmetic_mean + delta)
end

#cover?(other, alpha = 0.05) ⇒ `Boolean`

Return true, if the Sequence instance covers the other, that is their arithmetic mean value is most likely to be equal for the alpha error level.

Returns:

(Boolean)

# File 'lib/more_math/sequence.rb', line 262

def cover?(other, alpha = 0.05)
  t = t_welch(other)
  td = TDistribution.new(compute_welch_df(other))
  t.abs < td.inverse_probability(1 - alpha.abs / 2.0)
end

#detect_autocorrelation(lags = 20, alpha_level = 0.05) ⇒ `Object`

This method tries to detect autocorrelation with the Ljung-Box statistic. If enough lags can be considered it returns a hash with results, otherwise nil is returned. The keys are

:lags: the number of lags,
:alpha_level: the alpha level for the test,
:q: the value of the ljung_box_statistic,
:p: the p-value computed, if p is higher than alpha no correlation was detected,
:detected: true if a correlation was found.

# File 'lib/more_math/sequence.rb', line 323

def detect_autocorrelation(lags = 20, alpha_level = 0.05)
  if q = ljung_box_statistic(lags)
    p = ChiSquareDistribution.new(lags).probability(q)
    return {
      :lags         => lags,
      :alpha_level  => alpha_level,
      :q            => q,
      :p            => p,
      :detected     => p >= 1 - alpha_level,
    }
  end
end

#detect_outliers(factor = 3.0, epsilon = 1E-5) ⇒ `Object`

Return a result hash with the number of :very_low, :low, :high, and :very_high outliers, determined by the box plotting algorithm run with :median and :iqr parameters. If no outliers were found or the iqr is less than epsilon, nil is returned.

# File 'lib/more_math/sequence.rb', line 340

def detect_outliers(factor = 3.0, epsilon = 1E-5)
  half_factor = factor / 2.0
  quartile1 = percentile(25)
  quartile3 = percentile(75)
  iqr = quartile3 - quartile1
  iqr < epsilon and return
  result = @elements.inject(Hash.new(0)) do |h, t|
    extreme =
      case t
      when -Infinity..(quartile1 - factor * iqr)
        :very_low
      when (quartile1 - factor * iqr)..(quartile1 - half_factor * iqr)
        :low
      when (quartile1 + half_factor * iqr)..(quartile3 + factor * iqr)
        :high
      when (quartile3 + factor * iqr)..Infinity
        :very_high
      end and h[extreme] += 1
    h
  end
  unless result.empty?
    result[:median] = median
    result[:iqr] = iqr
    result[:factor] = factor
    result
  end
end

#durbin_watson_statistic ⇒ `Object`

Returns the d-value for the Durbin-Watson statistic. The value is d << 2 for positive, d >> 2 for negative and d around 2 for no autocorrelation.

# File 'lib/more_math/sequence.rb', line 297

def durbin_watson_statistic
  e = linear_regression.residuals
  e.size <= 1 and return 2.0
  (1...e.size).inject(0.0) { |s, i| s + (e[i] - e[i - 1]) ** 2 } /
    e.inject(0.0) { |s, x| s + x ** 2 }
end

#each(&block) ⇒ `Object`

Calls the block for every element of this Sequence.



18
19
20

# File 'lib/more_math/sequence.rb', line 18

def each(&block)
  @elements.each(&block)
end

#empty? ⇒ `Boolean`

Returns true if this sequence is empty, otherwise false.

Returns:

(Boolean)



24
25
26

# File 'lib/more_math/sequence.rb', line 24

def empty?
  @elements.empty?
end

#histogram(bins) ⇒ `Object`

Returns a Histogram instance with bins as the number of bins for this analysis’ elements.



377
378
379

# File 'lib/more_math/sequence.rb', line 377

def histogram(bins)
  Histogram.new(self, bins)
end

#ljung_box_statistic(lags = 20) ⇒ `Object`

Returns the q value of the Ljung-Box statistic for the number of lags lags. A higher value might indicate autocorrelation in the elements of this Sequence instance. This method returns nil if there weren’t enough (at least lags) lags available.

# File 'lib/more_math/sequence.rb', line 308

def ljung_box_statistic(lags = 20)
  r = autocorrelation
  lags >= r.size and return
  n = size
  n * (n + 2) * (1..lags).inject(0.0) { |s, i| s + r[i] ** 2 / (n - i) }
end

#percentile(p = 50) ⇒ `Object` Also known as: median

Returns the p-percentile of the elements. There are many methods to compute the percentile, this method uses the the weighted average at x_(n + 1)p, which allows p to be in 0…100 (excluding the 100).

# File 'lib/more_math/sequence.rb', line 177

def percentile(p = 50)
  (0...100).include?(p) or
    raise ArgumentError, "p = #{p}, but has to be in (0...100)"
  p /= 100.0
  sorted_elements = sorted
  r = p * (sorted_elements.size + 1)
  r_i = r.to_i
  r_f = r - r_i
  if r_i >= 1
    result = sorted_elements[r_i - 1]
    if r_i < sorted_elements.size
      result += r_f * (sorted_elements[r_i] - sorted_elements[r_i - 1])
    end
  else
    result = sorted_elements[0]
  end
  result
end

#push(element) ⇒ `Object` Also known as: <<

Push element on this Sequence and return a new Sequence instance with element as its last element.



47
48
49

# File 'lib/more_math/sequence.rb', line 47

def push(element)
  Sequence.new(@elements.dup.push(element))
end

#reset ⇒ `Object`

Reset all memoized values of this sequence.

# File 'lib/more_math/sequence.rb', line 34

def reset
  self.class.mize_cache_clear
  self
end

#size ⇒ `Object`

Returns the number of elements, on which the analysis is based.



29
30
31

# File 'lib/more_math/sequence.rb', line 29

def size
  @elements.size
end

#suggested_sample_size(other, alpha = 0.05, beta = 0.05) ⇒ `Object`

Compute a sample size, that will more likely yield a mean difference between this instance’s elements and those of other. Use alpha and beta as levels for the first- and second-order errors.

# File 'lib/more_math/sequence.rb', line 249

def suggested_sample_size(other, alpha = 0.05, beta = 0.05)
  alpha, beta = alpha.abs, beta.abs
  signal = arithmetic_mean - other.arithmetic_mean
  df = size + other.size - 2
  pooled_variance_estimate = (sum_of_squares + other.sum_of_squares) / df
  td = TDistribution.new df
  (((td.inverse_probability(alpha) + td.inverse_probability(beta)) *
    Math.sqrt(pooled_variance_estimate)) / signal) ** 2
end

#t_student(other) ⇒ `Object`

Returns the t value of the Student’s t-test between this Sequence instance and the other.

# File 'lib/more_math/sequence.rb', line 237

def t_student(other)
  signal = arithmetic_mean - other.arithmetic_mean
  noise = common_standard_deviation(other) *
    Math.sqrt(size ** -1 + size ** -1)
  signal / noise
rescue Errno::EDOM
  0.0
end

#t_welch(other) ⇒ `Object`

Returns the t value of the Welch’s t-test between this Sequence instance and the other.

# File 'lib/more_math/sequence.rb', line 208

def t_welch(other)
  signal = arithmetic_mean - other.arithmetic_mean
  noise = Math.sqrt(sample_variance / size +
    other.sample_variance / other.size)
  signal / noise
rescue Errno::EDOM
  0.0
end

#to_ary ⇒ `Object` Also known as: to_a



39
40
41

# File 'lib/more_math/sequence.rb', line 39

def to_ary
  @elements.dup
end

Class: MoreMath::Sequence

Overview

Defined Under Namespace

Instance Attribute Summary collapse

Instance Method Summary collapse

Methods included from MovingAverage

Constructor Details

#initialize(elements) ⇒ Sequence

Instance Attribute Details

#elements ⇒ Object (readonly)

Instance Method Details

#autocorrelation ⇒ Object

#autovariance ⇒ Object

#common_standard_deviation(other) ⇒ Object

#common_variance(other) ⇒ Object

#compute_student_df(other) ⇒ Object

#compute_welch_df(other) ⇒ Object

#confidence_interval(alpha = 0.05) ⇒ Object

#cover?(other, alpha = 0.05) ⇒ Boolean

#detect_autocorrelation(lags = 20, alpha_level = 0.05) ⇒ Object

#detect_outliers(factor = 3.0, epsilon = 1E-5) ⇒ Object

#durbin_watson_statistic ⇒ Object

#each(&block) ⇒ Object

#empty? ⇒ Boolean

#histogram(bins) ⇒ Object

#ljung_box_statistic(lags = 20) ⇒ Object

#percentile(p = 50) ⇒ Object Also known as: median

#push(element) ⇒ Object Also known as: <<

#reset ⇒ Object

#size ⇒ Object

#suggested_sample_size(other, alpha = 0.05, beta = 0.05) ⇒ Object

#t_student(other) ⇒ Object

#t_welch(other) ⇒ Object

#to_ary ⇒ Object Also known as: to_a