Class: MoreMath::Sequence
- Includes:
- Enumerable, MovingAverage
- Defined in:
- lib/more_math/sequence.rb,
lib/more_math/sequence/moving_average.rb
Overview
This class is used to contain elements and compute various statistical values for them.
Defined Under Namespace
Modules: MovingAverage, Refinement
Instance Attribute Summary collapse
-
#elements ⇒ Object
readonly
Returns the array of elements.
Instance Method Summary collapse
-
#autocorrelation ⇒ Object
Returns the array of autocorrelation values c_k / c_0 (of length size - 1).
-
#autovariance ⇒ Object
Returns the array of autovariances (of length size - 1).
-
#common_standard_deviation(other) ⇒ Object
Returns an estimation of the common standard deviation of the elements of this and
other
. -
#common_variance(other) ⇒ Object
Returns an estimation of the common variance of the elements of this and
other
. -
#compute_student_df(other) ⇒ Object
Compute the # degrees of freedom for Student’s t-test.
-
#compute_welch_df(other) ⇒ Object
Use an approximation of the Welch-Satterthwaite equation to compute the degrees of freedom for Welch’s t-test.
-
#confidence_interval(alpha = 0.05) ⇒ Object
Return the confidence interval for the arithmetic mean with alpha level
alpha
of the elements of this Sequence instance as a Range object. -
#cover?(other, alpha = 0.05) ⇒ Boolean
Return true, if the Sequence instance covers the
other
, that is their arithmetic mean value is most likely to be equal for thealpha
error level. -
#detect_autocorrelation(lags = 20, alpha_level = 0.05) ⇒ Object
This method tries to detect autocorrelation with the Ljung-Box statistic.
-
#detect_outliers(factor = 3.0, epsilon = 1E-5) ⇒ Object
Return a result hash with the number of :very_low, :low, :high, and :very_high outliers, determined by the box plotting algorithm run with :median and :iqr parameters.
-
#durbin_watson_statistic ⇒ Object
Returns the d-value for the Durbin-Watson statistic.
-
#each(&block) ⇒ Object
Calls the
block
for every element of this Sequence. -
#empty? ⇒ Boolean
Returns true if this sequence is empty, otherwise false.
-
#histogram(bins) ⇒ Object
Returns a Histogram instance with
bins
as the number of bins for this analysis’ elements. -
#initialize(elements) ⇒ Sequence
constructor
A new instance of Sequence.
-
#ljung_box_statistic(lags = 20) ⇒ Object
Returns the q value of the Ljung-Box statistic for the number of lags
lags
. -
#percentile(p = 50) ⇒ Object
(also: #median)
Returns the
p
-percentile of the elements. -
#push(element) ⇒ Object
(also: #<<)
Push
element
on this Sequence and return a new Sequence instance withelement
as its last element. -
#reset ⇒ Object
Reset all memoized values of this sequence.
-
#size ⇒ Object
Returns the number of elements, on which the analysis is based.
-
#suggested_sample_size(other, alpha = 0.05, beta = 0.05) ⇒ Object
Compute a sample size, that will more likely yield a mean difference between this instance’s elements and those of
other
. -
#t_student(other) ⇒ Object
Returns the t value of the Student’s t-test between this Sequence instance and the
other
. -
#t_welch(other) ⇒ Object
Returns the t value of the Welch’s t-test between this Sequence instance and the
other
. - #to_ary ⇒ Object (also: #to_a)
Methods included from MovingAverage
Constructor Details
#initialize(elements) ⇒ Sequence
Returns a new instance of Sequence.
10 11 12 |
# File 'lib/more_math/sequence.rb', line 10 def initialize(elements) @elements = elements.dup.freeze end |
Instance Attribute Details
#elements ⇒ Object (readonly)
Returns the array of elements.
15 16 17 |
# File 'lib/more_math/sequence.rb', line 15 def elements @elements end |
Instance Method Details
#autocorrelation ⇒ Object
Returns the array of autocorrelation values c_k / c_0 (of length size - 1).
290 291 292 293 |
# File 'lib/more_math/sequence.rb', line 290 def autocorrelation c = autovariance Array.new(c.size) { |k| c[k] / c[0] } end |
#autovariance ⇒ Object
Returns the array of autovariances (of length size - 1).
278 279 280 281 282 283 284 285 286 |
# File 'lib/more_math/sequence.rb', line 278 def autovariance Array.new(size - 1) do |k| s = 0.0 0.upto(size - k - 1) do |i| s += (@elements[i] - arithmetic_mean) * (@elements[i + k] - arithmetic_mean) end s / size end end |
#common_standard_deviation(other) ⇒ Object
Returns an estimation of the common standard deviation of the elements of this and other
.
219 220 221 |
# File 'lib/more_math/sequence.rb', line 219 def common_standard_deviation(other) Math.sqrt(common_variance(other)) end |
#common_variance(other) ⇒ Object
Returns an estimation of the common variance of the elements of this and other
.
225 226 227 228 |
# File 'lib/more_math/sequence.rb', line 225 def common_variance(other) (size - 1) * sample_variance + (other.size - 1) * other.sample_variance / (size + other.size - 2) end |
#compute_student_df(other) ⇒ Object
Compute the # degrees of freedom for Student’s t-test.
231 232 233 |
# File 'lib/more_math/sequence.rb', line 231 def compute_student_df(other) size + other.size - 2 end |
#compute_welch_df(other) ⇒ Object
Use an approximation of the Welch-Satterthwaite equation to compute the degrees of freedom for Welch’s t-test.
200 201 202 203 204 |
# File 'lib/more_math/sequence.rb', line 200 def compute_welch_df(other) (sample_variance / size + other.sample_variance / other.size) ** 2 / ( (sample_variance ** 2 / (size ** 2 * (size - 1))) + (other.sample_variance ** 2 / (other.size ** 2 * (other.size - 1)))) end |
#confidence_interval(alpha = 0.05) ⇒ Object
Return the confidence interval for the arithmetic mean with alpha level alpha
of the elements of this Sequence instance as a Range object.
270 271 272 273 274 275 |
# File 'lib/more_math/sequence.rb', line 270 def confidence_interval(alpha = 0.05) td = TDistribution.new(size - 1) t = td.inverse_probability(alpha / 2).abs delta = t * sample_standard_deviation / Math.sqrt(size) (arithmetic_mean - delta)..(arithmetic_mean + delta) end |
#cover?(other, alpha = 0.05) ⇒ Boolean
Return true, if the Sequence instance covers the other
, that is their arithmetic mean value is most likely to be equal for the alpha
error level.
262 263 264 265 266 |
# File 'lib/more_math/sequence.rb', line 262 def cover?(other, alpha = 0.05) t = t_welch(other) td = TDistribution.new(compute_welch_df(other)) t.abs < td.inverse_probability(1 - alpha.abs / 2.0) end |
#detect_autocorrelation(lags = 20, alpha_level = 0.05) ⇒ Object
This method tries to detect autocorrelation with the Ljung-Box statistic. If enough lags can be considered it returns a hash with results, otherwise nil is returned. The keys are
- :lags
-
the number of lags,
- :alpha_level
-
the alpha level for the test,
- :q
-
the value of the ljung_box_statistic,
- :p
-
the p-value computed, if p is higher than alpha no correlation was detected,
- :detected
-
true if a correlation was found.
323 324 325 326 327 328 329 330 331 332 333 334 |
# File 'lib/more_math/sequence.rb', line 323 def detect_autocorrelation(lags = 20, alpha_level = 0.05) if q = ljung_box_statistic(lags) p = ChiSquareDistribution.new(lags).probability(q) return { :lags => lags, :alpha_level => alpha_level, :q => q, :p => p, :detected => p >= 1 - alpha_level, } end end |
#detect_outliers(factor = 3.0, epsilon = 1E-5) ⇒ Object
Return a result hash with the number of :very_low, :low, :high, and :very_high outliers, determined by the box plotting algorithm run with :median and :iqr parameters. If no outliers were found or the iqr is less than epsilon, nil is returned.
340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 |
# File 'lib/more_math/sequence.rb', line 340 def detect_outliers(factor = 3.0, epsilon = 1E-5) half_factor = factor / 2.0 quartile1 = percentile(25) quartile3 = percentile(75) iqr = quartile3 - quartile1 iqr < epsilon and return result = @elements.inject(Hash.new(0)) do |h, t| extreme = case t when -Infinity..(quartile1 - factor * iqr) :very_low when (quartile1 - factor * iqr)..(quartile1 - half_factor * iqr) :low when (quartile1 + half_factor * iqr)..(quartile3 + factor * iqr) :high when (quartile3 + factor * iqr)..Infinity :very_high end and h[extreme] += 1 h end unless result.empty? result[:median] = median result[:iqr] = iqr result[:factor] = factor result end end |
#durbin_watson_statistic ⇒ Object
Returns the d-value for the Durbin-Watson statistic. The value is d << 2 for positive, d >> 2 for negative and d around 2 for no autocorrelation.
297 298 299 300 301 302 |
# File 'lib/more_math/sequence.rb', line 297 def durbin_watson_statistic e = linear_regression.residuals e.size <= 1 and return 2.0 (1...e.size).inject(0.0) { |s, i| s + (e[i] - e[i - 1]) ** 2 } / e.inject(0.0) { |s, x| s + x ** 2 } end |
#each(&block) ⇒ Object
Calls the block
for every element of this Sequence.
18 19 20 |
# File 'lib/more_math/sequence.rb', line 18 def each(&block) @elements.each(&block) end |
#empty? ⇒ Boolean
Returns true if this sequence is empty, otherwise false.
24 25 26 |
# File 'lib/more_math/sequence.rb', line 24 def empty? @elements.empty? end |
#histogram(bins) ⇒ Object
Returns a Histogram instance with bins
as the number of bins for this analysis’ elements.
377 378 379 |
# File 'lib/more_math/sequence.rb', line 377 def histogram(bins) Histogram.new(self, bins) end |
#ljung_box_statistic(lags = 20) ⇒ Object
Returns the q value of the Ljung-Box statistic for the number of lags lags
. A higher value might indicate autocorrelation in the elements of this Sequence instance. This method returns nil if there weren’t enough (at least lags) lags available.
308 309 310 311 312 313 |
# File 'lib/more_math/sequence.rb', line 308 def ljung_box_statistic(lags = 20) r = autocorrelation lags >= r.size and return n = size n * (n + 2) * (1..lags).inject(0.0) { |s, i| s + r[i] ** 2 / (n - i) } end |
#percentile(p = 50) ⇒ Object Also known as: median
Returns the p
-percentile of the elements. There are many methods to compute the percentile, this method uses the the weighted average at x_(n + 1)p, which allows p to be in 0…100 (excluding the 100).
177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 |
# File 'lib/more_math/sequence.rb', line 177 def percentile(p = 50) (0...100).include?(p) or raise ArgumentError, "p = #{p}, but has to be in (0...100)" p /= 100.0 sorted_elements = sorted r = p * (sorted_elements.size + 1) r_i = r.to_i r_f = r - r_i if r_i >= 1 result = sorted_elements[r_i - 1] if r_i < sorted_elements.size result += r_f * (sorted_elements[r_i] - sorted_elements[r_i - 1]) end else result = sorted_elements[0] end result end |
#push(element) ⇒ Object Also known as: <<
Push element
on this Sequence and return a new Sequence instance with element
as its last element.
47 48 49 |
# File 'lib/more_math/sequence.rb', line 47 def push(element) Sequence.new(@elements.dup.push(element)) end |
#reset ⇒ Object
Reset all memoized values of this sequence.
34 35 36 37 |
# File 'lib/more_math/sequence.rb', line 34 def reset self.class.mize_cache_clear self end |
#size ⇒ Object
Returns the number of elements, on which the analysis is based.
29 30 31 |
# File 'lib/more_math/sequence.rb', line 29 def size @elements.size end |
#suggested_sample_size(other, alpha = 0.05, beta = 0.05) ⇒ Object
Compute a sample size, that will more likely yield a mean difference between this instance’s elements and those of other
. Use alpha
and beta
as levels for the first- and second-order errors.
249 250 251 252 253 254 255 256 257 |
# File 'lib/more_math/sequence.rb', line 249 def suggested_sample_size(other, alpha = 0.05, beta = 0.05) alpha, beta = alpha.abs, beta.abs signal = arithmetic_mean - other.arithmetic_mean df = size + other.size - 2 pooled_variance_estimate = (sum_of_squares + other.sum_of_squares) / df td = TDistribution.new df (((td.inverse_probability(alpha) + td.inverse_probability(beta)) * Math.sqrt(pooled_variance_estimate)) / signal) ** 2 end |
#t_student(other) ⇒ Object
Returns the t value of the Student’s t-test between this Sequence instance and the other
.
237 238 239 240 241 242 243 244 |
# File 'lib/more_math/sequence.rb', line 237 def t_student(other) signal = arithmetic_mean - other.arithmetic_mean noise = common_standard_deviation(other) * Math.sqrt(size ** -1 + size ** -1) signal / noise rescue Errno::EDOM 0.0 end |
#t_welch(other) ⇒ Object
Returns the t value of the Welch’s t-test between this Sequence instance and the other
.
208 209 210 211 212 213 214 215 |
# File 'lib/more_math/sequence.rb', line 208 def t_welch(other) signal = arithmetic_mean - other.arithmetic_mean noise = Math.sqrt(sample_variance / size + other.sample_variance / other.size) signal / noise rescue Errno::EDOM 0.0 end |
#to_ary ⇒ Object Also known as: to_a
39 40 41 |
# File 'lib/more_math/sequence.rb', line 39 def to_ary @elements.dup end |