Class: Kalibera::Data

Inherits:

Object

Object
Kalibera::Data

show all

Extended by:: Memoist

Defined in:: lib/kalibera/data.rb

Instance Method Summary collapse

#[](*indicies) ⇒ Object
#bootstrap_confidence_interval(iterations = 10000, confidence = "0.95") ⇒ Object

Compute a confidence interval via bootstrap method.
#bootstrap_means(iterations = 1000) ⇒ Object

Compute a list of simulated means from bootstrap resampling.
#bootstrap_quotient(other, iterations = 10000, confidence = '0.95') ⇒ Object
#bootstrap_sample ⇒ Object
#confidence95 ⇒ Object

Compute the 95% confidence interval.
#index_iterator(start = 0, stop = nil) ⇒ Object

Computes a list of all possible data indcies gievn that start <= index <= stop are fixed.
#initialize(data, reps) ⇒ Data constructor

Instances of this class store measurements (corresponding to the Y_… in the papers).
#mean(indicies = []) ⇒ Object

Compute the mean across a number of values.
#n ⇒ Object

The number of levels in the experiment.
#optimalreps(i, costs) ⇒ Object

Computes the optimal number of repetitions for a given level.
#r(i) ⇒ Object

The number of repetitions for level i.
#random_measurement_sample(index = []) ⇒ Object
#Si2(i) ⇒ Object

Biased estimator S_i^2.
#Ti2(i) ⇒ Object

Compute the unbiased T_i^2 variance estimator.

Constructor Details

#initialize(data, reps) ⇒ `Data`

Instances of this class store measurements (corresponding to the Y_… in the papers).

Arguments: data – Dict mapping tuples of all but the last index to lists of values. reps – List of reps for each level, high to low.

# File 'lib/kalibera/data.rb', line 131

def initialize(data, reps)
  @data = data
  @reps = reps

  # check that all data is there

  array = reps.map { |i| (0...i).to_a }
  array[0].product(*array.drop(1)).each do |index|
    self[*index] # does not crash
  end
end

Instance Method Details

#[](*indicies) ⇒ `Object`

# File 'lib/kalibera/data.rb', line 143

def [](*indicies)
  raise unless indicies.size == @reps.size
  x = @data[indicies[0...indicies.size-1]]
  raise unless !x.nil?
  x[indicies[-1]]
end

#bootstrap_confidence_interval(iterations = 10000, confidence = "0.95") ⇒ `Object`

Compute a confidence interval via bootstrap method.

Keyword arguments: iterations – Number of resamplings to base result upon. Default is 10000. confidence – The required confidence. Default is “0.95” (95%).

# File 'lib/kalibera/data.rb', line 306

def bootstrap_confidence_interval(iterations=10000, confidence="0.95")
  means = bootstrap_means(iterations)
  Kalibera.confidence_slice(means, confidence)
end

#bootstrap_means(iterations = 1000) ⇒ `Object`

Compute a list of simulated means from bootstrap resampling.

Note that, resampling occurs with replacement.

Keyword arguments: iterations – Number of resamples (and thus means) generated.

# File 'lib/kalibera/data.rb', line 291

def bootstrap_means(iterations=1000)
  means = []
  for i in 0...iterations
    values = bootstrap_sample()
    means.push(Kalibera.mean(values))
  end
  means.sort()
  means
end

#bootstrap_quotient(other, iterations = 10000, confidence = '0.95') ⇒ `Object`

# File 'lib/kalibera/data.rb', line 331

def bootstrap_quotient(other, iterations=10000, confidence='0.95')
  ratios = []
  for _ in 0...iterations
    ra = bootstrap_sample()
    rb = other.bootstrap_sample()
    mean_ra = Kalibera.mean(ra)
    mean_rb = Kalibera.mean(rb)

    if mean_rb == 0 # protect against divide by zero
      ratios.push(Float::INFINITY)
    else
      ratios.push(mean_ra / mean_rb)
    end
  end
  ratios.sort!
  Kalibera.confidence_slice(ratios, confidence).values
end

#bootstrap_sample ⇒ `Object`



327
328
329

# File 'lib/kalibera/data.rb', line 327

def bootstrap_sample
  random_measurement_sample
end

#confidence95 ⇒ `Object`

Compute the 95% confidence interval.

# File 'lib/kalibera/data.rb', line 279

def confidence95
  degfreedom = @reps[0] - 1
  student_t_quantile95(degfreedom) *
    (Si2(n) / @reps[0]) ** 0.5
end

#index_iterator(start = 0, stop = nil) ⇒ `Object`

Computes a list of all possible data indcies gievn that start <= index <= stop are fixed.

# File 'lib/kalibera/data.rb', line 152

def index_iterator(start=0, stop=nil)
  if stop.nil?
    stop = n
  end

  maximum_indicies = @reps[start...stop]
  remaining_indicies = maximum_indicies.map { |maximum| (0...maximum).to_a }
  return [[]] if remaining_indicies.empty?
  remaining_indicies[0].product(*remaining_indicies.drop(1))
end

#mean(indicies = []) ⇒ `Object`

Compute the mean across a number of values.

Keyword arguments: indicies – tuple of fixed indicies over which to compute the mean, given from left to right. The remaining indicies are variable.

# File 'lib/kalibera/data.rb', line 184

def mean(indicies=[])
  remaining_indicies_cross_product =
      index_iterator(start=indicies.size)
  alldata = remaining_indicies_cross_product.map { |remaining| self[*(indicies + remaining)] }
  Kalibera.mean(alldata)
end

#n ⇒ `Object`

The number of levels in the experiment.



164
165
166

# File 'lib/kalibera/data.rb', line 164

def n
  @reps.size
end

#optimalreps(i, costs) ⇒ `Object`

Computes the optimal number of repetitions for a given level.

Note that the resulting number of reps is not rounded.

Arguments: i – the mathematical level of which to compute optimal reps. costs – A list of costs for each level, high to low.

# File 'lib/kalibera/data.rb', line 266

def optimalreps(i, costs)
  # NOTE: Does not round
  costs = costs.map { |x| Float(x) }
  raise unless 1 <= i
  raise unless i < n
  index = n - i
  return (costs[index - 1] / costs[index] *
      Ti2(i) / Ti2(i + 1)) ** 0.5
end

#r(i) ⇒ `Object`

The number of repetitions for level i.

Arguments: i – mathematical index.

# File 'lib/kalibera/data.rb', line 172

def r(i)
  raise unless 1 <= i
  raise unless i <= n
  index = n - i
  @reps[index]
end

#random_measurement_sample(index = []) ⇒ `Object`

# File 'lib/kalibera/data.rb', line 311

def random_measurement_sample(index=[])
  results = []
  if index.size == n
    results.push self[*index]
  else
    indicies = (0...@reps[index.size]).map { |i| rand(@reps[index.size]) }
    for single_index in indicies
      newindex = index + [single_index]
      for value in random_measurement_sample(newindex)
        results.push value
      end
    end
  end
  results
end

#Si2(i) ⇒ `Object`

Biased estimator S_i^2.

Arguments: i – the mathematical index of the level from which to compute S_i^2

# File 'lib/kalibera/data.rb', line 197

def Si2(i)
  raise unless 1 <= i
  raise unless i <= n
  # @reps is indexed from the left to right
  index = n - i
  factor = 1.0

  # We compute this iteratively leveraging the fact that
  # 1 / (a * b) = (1 / a) / b
  for rep in @reps[0, index]
    factor /= rep
  end
  # Then at this point we have:
  # factor * (1 / (r_i - 1)) = factor / (r_i - 1)
  factor /=  @reps[index] - 1

  # Second line of the above definition, the lines are multiplied.
  indicies = index_iterator(0, index+1)
  sum = 0.0
  for index in indicies
    a = mean(index)
    b = mean(index[0,index.size-1])
    sum += (a - b) ** 2
  end
  factor * sum
end

#Ti2(i) ⇒ `Object`

Compute the unbiased T_i^2 variance estimator.

Arguments: i – the mathematical index from which to compute T_i^2.

# File 'lib/kalibera/data.rb', line 230

def Ti2(i)
  # This is the broken implementation of T_i^2 shown in the pubslished
  # version of "Rigorous benchmarking in reasonable time". Tomas has
  # since fixed this in local versions of the paper.
  #@memoize
  #def broken_Ti2(self, i)
  #  """ Compute the unbiased T_i^2 variance estimator.
  #
  #  Arguments:
  #  i -- the mathematical index from which to compute T_i^2.
  #  """
  #
  #  raise unless 1 <= i <= n
  #  if i == 1:
  #    return self.Si2(1)
  #  return self.Si2(i) - self.Ti2(i - 1) / self.r(i - 1)

  # This is the correct definition of T_i^2

  raise unless 1 <= i
  raise unless i <= n
  if i == 1
    return Si2(1)
  end
  Si2(i) - Si2(i - 1) / r(i - 1)
end

Class: Kalibera::Data

Instance Method Summary collapse

Constructor Details

#initialize(data, reps) ⇒ Data

Instance Method Details

#[](*indicies) ⇒ Object

#bootstrap_confidence_interval(iterations = 10000, confidence = "0.95") ⇒ Object

#bootstrap_means(iterations = 1000) ⇒ Object

#bootstrap_quotient(other, iterations = 10000, confidence = '0.95') ⇒ Object

#bootstrap_sample ⇒ Object

#confidence95 ⇒ Object

#index_iterator(start = 0, stop = nil) ⇒ Object

#mean(indicies = []) ⇒ Object

#n ⇒ Object

#optimalreps(i, costs) ⇒ Object

#r(i) ⇒ Object

#random_measurement_sample(index = []) ⇒ Object

#Si2(i) ⇒ Object

#Ti2(i) ⇒ Object