Module: Digiproc::Probability
- Defined in:
- lib/probability/probability.rb
Overview
Module supplying probability functions
Defined Under Namespace
Classes: GaussianDistribution, RandomBitGenerator, TheoreticalBinomialDistribution, TheoreticalGaussianDistribution
Class Method Summary collapse
-
.corr_coeff(d1, d2) ⇒ Object
Alias for #correlation_coefficient.
-
.correlation_coefficient(data1, data2) ⇒ Object
- Calculates Pearson’s correlation coefficient of two datasets == Arguments data1
- Array
- first dataset data2
- Array
-
second dataset dat = [4,12,19,4,8,9] dat2 = [19, 20, 35, 15, 13, 17] Digiproc::Probability.correlation_coefficient(dat, dat2) # => 0.84450000297225 cdat = [1,2,3,4,5] ddat = cdat.map{ |v| v * -29 } Digiproc::Probability.correlation_coefficient(cdat, ddat) # => -1.0.
-
.cov(d1, d2) ⇒ Object
Alias for #covariance.
-
.covariance(data1, data2) ⇒ Object
- Returns [Float] covariance of two datasets == Arguments data1
- Array
- first dataset data2
- Array
-
second datatset dat = [4,12,19,4,8,9] dat2 = [19, 20, 35, 15, 13, 17] Digiproc::Probability.ovariance(dat, dat2) # => 37.46666666666667.
-
.erf(x) ⇒ Object
- Returns the error function output [Float] of the input ie: For an input x and a normal distribution with mean 0 and variance 0.5, it is the probability a random variable will be between -x and x Mirrors Math.erf(x) == Arguments x
- Float
-
Digiproc::Prbobability.erf(0.3) # => 0.32862675945912734.
-
.erfc(x) ⇒ Object
- Returns the erfc [Float] per Math.erfc For an input x and a normal distribution with mean 0 and variance 0.5, it is the probability a random variable will not be between -x and x == Arguments x
- Float
-
Digiproc::Prbobability.erfc(0.3) # => 0.6713732405408727.
-
.mean(data) ⇒ Object
- Returns [Float] the mean of the inputted data == Arguments data
- Array
-
data to be evaluated dat = [4,12,19,4,8,9] Digiproc::Probability.mean(dat) # => 9.333333333333334.
-
.normal_cdf(x, mean = 0, stddev = 1) ⇒ Object
- Returns [Float] probability that a random variable from a normal distribution of a certain mean and standard deviation will be less than a value x == Arguments x
- Float
- the value which the probability will be evaluated that a random variable will be below mean
- Float
- the mean of the normal distribution (defaults to 0) stddev
- Float
-
the standard deviation of the normal distribution (defaults to 1) Digiproc::Probability.normal_cdf(0.5) # => 0.691462461274013 Digiproc::Probability.normal_cdf(140, 100, 15) # => 0.9961696194324102.
-
.normal_q(x, mean = 0, stddev = 1) ⇒ Object
- Returns [Float] the probability that a random variable from a normal distribution of a certain mean and standard deviation will be greater than a value x == Arguments x
- Float
- the value which the probability will be evaluated that a random variable will be below mean
- Float
- the mean of the normal distribution (defaults to 0) stddev
- Float
-
the standard deviation of the normal distribution (defaults to 1) Digiproc::Probability.normal_q(0.5) # => 0.30853753872598694 Digiproc::Probability.normal_q(140, 100, 15) # => 0.0038303805675897395.
-
.normal_random_generator(mean = 0, stddev = 1) ⇒ Object
- Returns a new instance of a normal number random generator, which contains an instance method #rand which will return a single random number from the normal distribution == Arguments mean
- Float
- mean of the distribution defaults to 0 stddev
- Float
-
standard deviation of the distribution defaults to 1 gen = Digiproc::Probability.normal_random_generator(100, 15) # => return rand gen of IQ scores scores = [] 10.times do scores << gen.rand end puts scores # => [98.22222989456891, 102.78474153867536, 130.32345913801984, 90.68720464516447, 94.45065665514275, 100.46571157090098, 105.40955807686252, 85.9616084359681, 103.05219341559669, 96.62475435904693].
-
.nrand(size = 1) ⇒ Object
Arguments size:: [Integer] number of returns expected, defaults to 1 If ‘size` is 1, returns a single number from the distribution For `size` > 1, return an array of numbers from the distribution Digiproc::Probability.nrand(5) # => [-0.4870987490684469, 0.48974360810927925, -0.3722088483576355, -0.2829898781247938, 0.12540113064787742] Digiproc::Probability.nrand # => 0.7978655621417761.
-
.pdf(data) ⇒ Object
- Returns [Hash] the discrete probability distribution function of an inputted array of values == Arguments data
- Array
-
discrete values in a dataset dat = [3,3,4,5,4,5,6,5,6,5,7,6,5,4] Digiproc::Probability.pdf(dat) # => 4=>3, 5=>5, 6=>3, 7=>1.
-
.stationary_covariance(data1, data2) ⇒ Object
- Returns [Float] the covariance (if process is stationary) of two datasets == Arguments data1
- Array
- first dataset data2
- Array
-
second datatset dat = [4,12,19,4,8,9] dat2 = [19, 20, 35, 15, 13, 17] Digiproc::Probability.stationary_covariance(dat, dat2) # => 31.22222222222223.
-
.stationary_variance(data) ⇒ Object
- Returns [Float] the variance of inputted data for a stationary process (stationary proces = mean, variance, autocorrelation do not change with time) Will run faster over large datasets because each val in data is not undergoing a subtraction with mu If the process is not stationary, an incorrect result with be given == Arguments data
- Array
-
data to be evaluated stat_var = sum_over_datavals( val^2 ) / number_ov_vals dat = [4,12,19,4,8,9] Digiproc::Probability.stationary_variance(dat) # => 26.555555555555543.
-
.stddev(data) ⇒ Object
- Returns [Float] the standard deviation of the inputted data == Arguments data
- Array
-
data to be evaluated stddev = sqrt(variance) dat = [4,12,19,4,8,9] Digiproc::Probability.stddev # => 5.645056834671079.
-
.var(d) ⇒ Object
Alias for #variance.
-
.variance(data) ⇒ Object
- Returns [Float] the variance of the inputted data == Arguments data
- Array
-
data to be evaluated variance = sum_over_datavals( (dataval - mean) ^ 2 ) / (number_of_datavals - 1) dat = [4,12,19,4,8,9] Digiproc::Probability.variance(dat) # => 31.866666666666664.
Class Method Details
.corr_coeff(d1, d2) ⇒ Object
Alias for #correlation_coefficient
164 165 166 |
# File 'lib/probability/probability.rb', line 164 def self.corr_coeff(d1, d2) correlation_coefficient(d1, d2) end |
.correlation_coefficient(data1, data2) ⇒ Object
Calculates Pearson’s correlation coefficient of two datasets
Arguments
- data1
- Array
-
first dataset
- data2
- Array
-
second dataset
dat = [4,12,19,4,8,9] dat2 = [19, 20, 35, 15, 13, 17] Digiproc::Probability.correlation_coefficient(dat, dat2) # => 0.84450000297225 cdat = [1,2,3,4,5] ddat = cdat.map{ |v| v * -29 } Digiproc::Probability.correlation_coefficient(cdat, ddat) # => -1.0
143 144 145 146 147 148 |
# File 'lib/probability/probability.rb', line 143 def self.correlation_coefficient(data1, data2) covar = covariance(data1, data2) var1 = variance(data1) var2 = variance(data2) return covar.to_f / ((var1 ** 0.5) * (var2 ** 0.5)) end |
.cov(d1, d2) ⇒ Object
Alias for #covariance
158 159 160 |
# File 'lib/probability/probability.rb', line 158 def self.cov(d1, d2) covariance(d1,d2) end |
.covariance(data1, data2) ⇒ Object
Returns [Float] covariance of two datasets
Arguments
- data1
- Array
-
first dataset
- data2
- Array
-
second datatset
dat = [4,12,19,4,8,9] dat2 = [19, 20, 35, 15, 13, 17] Digiproc::Probability.ovariance(dat, dat2) # => 37.46666666666667
121 122 123 124 125 126 127 128 129 130 |
# File 'lib/probability/probability.rb', line 121 def self.covariance(data1, data2) raise ArgumentError.new("Datasets must be of equal length") if data1.length != data2.length mu1 = mean(data1) mu2 = mean(data2) summation = 0 for i in 0...data1.length do summation += ((data1[i] - mu1) * (data2[i] - mu2)) end summation.to_f / (data1.length - 1) end |
.erf(x) ⇒ Object
Returns the error function output [Float] of the input ie: For an input x and a normal distribution with mean 0 and variance 0.5, it is the probability a random variable will be between -x and x Mirrors Math.erf(x)
Arguments
- x
- Float
Digiproc::Prbobability.erf(0.3) # => 0.32862675945912734
175 176 177 |
# File 'lib/probability/probability.rb', line 175 def self.erf(x) Math.erf(x) end |
.erfc(x) ⇒ Object
Returns the erfc [Float] per Math.erfc For an input x and a normal distribution with mean 0 and variance 0.5, it is the probability a random variable will not be between -x and x
Arguments
- x
- Float
Digiproc::Prbobability.erfc(0.3) # => 0.6713732405408727
185 186 187 |
# File 'lib/probability/probability.rb', line 185 def self.erfc(x) Math.erfc(x) end |
.mean(data) ⇒ Object
Returns [Float] the mean of the inputted data
Arguments
- data
- Array
-
data to be evaluated
dat = [4,12,19,4,8,9] Digiproc::Probability.mean(dat) # => 9.333333333333334
51 52 53 |
# File 'lib/probability/probability.rb', line 51 def self.mean(data) data.sum / data.length.to_f end |
.normal_cdf(x, mean = 0, stddev = 1) ⇒ Object
Returns [Float] probability that a random variable from a normal distribution of a certain mean and standard deviation will be less than a value x
Arguments
- x
- Float
-
the value which the probability will be evaluated that a random variable will be below
- mean
- Float
-
the mean of the normal distribution (defaults to 0)
- stddev
- Float
-
the standard deviation of the normal distribution (defaults to 1)
Digiproc::Probability.normal_cdf(0.5) # => 0.691462461274013 Digiproc::Probability.normal_cdf(140, 100, 15) # => 0.9961696194324102
197 198 199 |
# File 'lib/probability/probability.rb', line 197 def self.normal_cdf(x, mean = 0, stddev = 1) 1 - normal_q(x, mean, stddev) end |
.normal_q(x, mean = 0, stddev = 1) ⇒ Object
Returns [Float] the probability that a random variable from a normal distribution of a certain mean and standard deviation will be greater than a value x
Arguments
- x
- Float
-
the value which the probability will be evaluated that a random variable will be below
- mean
- Float
-
the mean of the normal distribution (defaults to 0)
- stddev
- Float
-
the standard deviation of the normal distribution (defaults to 1)
Digiproc::Probability.normal_q(0.5) # => 0.30853753872598694 Digiproc::Probability.normal_q(140, 100, 15) # => 0.0038303805675897395
209 210 211 212 |
# File 'lib/probability/probability.rb', line 209 def self.normal_q(x, mean = 0, stddev = 1) xformed_x = (x - mean) / stddev.to_f 0.5 * erfc(xformed_x / (2 ** 0.5)) end |
.normal_random_generator(mean = 0, stddev = 1) ⇒ Object
Returns a new instance of a normal number random generator, which contains an instance method #rand which will return a single random number from the normal distribution
Arguments
- mean
- Float
-
mean of the distribution defaults to 0
- stddev
- Float
-
standard deviation of the distribution defaults to 1
gen = Digiproc::Probability.normal_random_generator(100, 15) # => return rand gen of IQ scores scores = [] 10.times do
scores << gen.rand
end puts scores # => [98.22222989456891, 102.78474153867536, 130.32345913801984, 90.68720464516447, 94.45065665514275, 100.46571157090098, 105.40955807686252, 85.9616084359681, 103.05219341559669, 96.62475435904693]
25 26 27 |
# File 'lib/probability/probability.rb', line 25 def self.normal_random_generator(mean = 0, stddev = 1) @gaussian_generator.class.new(mean, stddev) end |
.nrand(size = 1) ⇒ Object
Arguments
- size
- Integer
-
number of returns expected, defaults to 1
If ‘size` is 1, returns a single number from the distribution For `size` > 1, return an array of numbers from the distribution Digiproc::Probability.nrand(5) # => [-0.4870987490684469, 0.48974360810927925, -0.3722088483576355, -0.2829898781247938, 0.12540113064787742] Digiproc::Probability.nrand # => 0.7978655621417761
36 37 38 39 40 41 42 43 |
# File 'lib/probability/probability.rb', line 36 def self.nrand(size = 1) return @gaussian_generator.rand if size == 1 rand_nums = [] size.times do rand_nums << @gaussian_generator.rand end return rand_nums end |
.pdf(data) ⇒ Object
Returns [Hash] the discrete probability distribution function of an inputted array of values
Arguments
- data
- Array
-
discrete values in a dataset
dat = [3,3,4,5,4,5,6,5,6,5,7,6,5,4] Digiproc::Probability.pdf(dat) # => 4=>3, 5=>5, 6=>3, 7=>1
220 221 222 223 224 225 226 227 228 229 230 231 |
# File 'lib/probability/probability.rb', line 220 def self.pdf(data) pdf = {} data.each do |datapoint| pt = datapoint.round(2) if pdf[pt].nil? pdf[pt] = 1 else pdf[pt] += 1 end end pdf end |
.stationary_covariance(data1, data2) ⇒ Object
Returns [Float] the covariance (if process is stationary) of two datasets
Arguments
- data1
- Array
-
first dataset
- data2
- Array
-
second datatset
dat = [4,12,19,4,8,9] dat2 = [19, 20, 35, 15, 13, 17] Digiproc::Probability.stationary_covariance(dat, dat2) # => 31.22222222222223
105 106 107 108 109 110 111 |
# File 'lib/probability/probability.rb', line 105 def self.stationary_covariance(data1, data2) raise ArgumentError.new("Datasets must be of equal length") if data1.length != data2.length xcs = data1.dot data2 mu1 = mean(data1) mu2 = mean(data2) (xcs / (data1.length.to_f)) - (mu1 * mu2) end |
.stationary_variance(data) ⇒ Object
Returns [Float] the variance of inputted data for a stationary process (stationary proces = mean, variance, autocorrelation do not change with time) Will run faster over large datasets because each val in data is not undergoing a subtraction with mu If the process is not stationary, an incorrect result with be given
Arguments
- data
- Array
-
data to be evaluated
stat_var = sum_over_datavals( val^2 ) / number_ov_vals dat = [4,12,19,4,8,9] Digiproc::Probability.stationary_variance(dat) # => 26.555555555555543
78 79 80 81 82 |
# File 'lib/probability/probability.rb', line 78 def self.stationary_variance(data) mu = mean(data) summation = data.map{ |val| val ** 2 }.sum (summation.to_f / data.length) - (mu ** 2) end |
.stddev(data) ⇒ Object
Returns [Float] the standard deviation of the inputted data
Arguments
- data
- Array
-
data to be evaluated
stddev = sqrt(variance) dat = [4,12,19,4,8,9] Digiproc::Probability.stddev # => 5.645056834671079
91 92 93 |
# File 'lib/probability/probability.rb', line 91 def self.stddev(data) variance(data) ** (0.5) end |
.var(d) ⇒ Object
Alias for #variance
152 153 154 |
# File 'lib/probability/probability.rb', line 152 def self.var(d) variance(d) end |
.variance(data) ⇒ Object
Returns [Float] the variance of the inputted data
Arguments
- data
- Array
-
data to be evaluated
variance = sum_over_datavals( (dataval - mean) ^ 2 ) / (number_of_datavals - 1) dat = [4,12,19,4,8,9] Digiproc::Probability.variance(dat) # => 31.866666666666664
62 63 64 65 66 |
# File 'lib/probability/probability.rb', line 62 def self.variance(data) mu = mean(data) summation = data.map{ |val| (val - mu) ** 2 }.sum summation.to_f / (data.length - 1) end |