Class: Kalibera::Data

Inherits:
Object
  • Object
show all
Extended by:
Memoist
Defined in:
lib/kalibera/data.rb

Instance Method Summary collapse

Constructor Details

#initialize(data, reps) ⇒ Data

Instances of this class store measurements (corresponding to the Y_… in the papers).

Arguments: data – Dict mapping tuples of all but the last index to lists of values. reps – List of reps for each level, high to low.



131
132
133
134
135
136
137
138
139
140
141
# File 'lib/kalibera/data.rb', line 131

def initialize(data, reps)
  @data = data
  @reps = reps

  # check that all data is there

  array = reps.map { |i| (0...i).to_a }
  array[0].product(*array.drop(1)).each do |index|
    self[*index] # does not crash
  end
end

Instance Method Details

#[](*indicies) ⇒ Object



143
144
145
146
147
148
# File 'lib/kalibera/data.rb', line 143

def [](*indicies)
  raise unless indicies.size == @reps.size
  x = @data[indicies[0...indicies.size-1]]
  raise unless !x.nil?
  x[indicies[-1]]
end

#bootstrap_confidence_interval(iterations = 10000, confidence = "0.95") ⇒ Object

Compute a confidence interval via bootstrap method.

Keyword arguments: iterations – Number of resamplings to base result upon. Default is 10000. confidence – The required confidence. Default is “0.95” (95%).



306
307
308
309
# File 'lib/kalibera/data.rb', line 306

def bootstrap_confidence_interval(iterations=10000, confidence="0.95")
  means = bootstrap_means(iterations)
  Kalibera.confidence_slice(means, confidence)
end

#bootstrap_means(iterations = 1000) ⇒ Object

Compute a list of simulated means from bootstrap resampling.

Note that, resampling occurs with replacement.

Keyword arguments: iterations – Number of resamples (and thus means) generated.



291
292
293
294
295
296
297
298
299
# File 'lib/kalibera/data.rb', line 291

def bootstrap_means(iterations=1000)
  means = []
  for i in 0...iterations
    values = bootstrap_sample()
    means.push(Kalibera.mean(values))
  end
  means.sort()
  means
end

#bootstrap_quotient(other, iterations = 10000, confidence = '0.95') ⇒ Object



331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
# File 'lib/kalibera/data.rb', line 331

def bootstrap_quotient(other, iterations=10000, confidence='0.95')
  ratios = []
  for _ in 0...iterations
    ra = bootstrap_sample()
    rb = other.bootstrap_sample()
    mean_ra = Kalibera.mean(ra)
    mean_rb = Kalibera.mean(rb)

    if mean_rb == 0 # protect against divide by zero
      ratios.push(Float::INFINITY)
    else
      ratios.push(mean_ra / mean_rb)
    end
  end
  ratios.sort!
  Kalibera.confidence_slice(ratios, confidence).values
end

#bootstrap_sampleObject



327
328
329
# File 'lib/kalibera/data.rb', line 327

def bootstrap_sample
  random_measurement_sample
end

#confidence95Object

Compute the 95% confidence interval.



279
280
281
282
283
# File 'lib/kalibera/data.rb', line 279

def confidence95
  degfreedom = @reps[0] - 1
  student_t_quantile95(degfreedom) *
    (Si2(n) / @reps[0]) ** 0.5
end

#index_iterator(start = 0, stop = nil) ⇒ Object

Computes a list of all possible data indcies gievn that start <= index <= stop are fixed.



152
153
154
155
156
157
158
159
160
161
# File 'lib/kalibera/data.rb', line 152

def index_iterator(start=0, stop=nil)
  if stop.nil?
    stop = n
  end

  maximum_indicies = @reps[start...stop]
  remaining_indicies = maximum_indicies.map { |maximum| (0...maximum).to_a }
  return [[]] if remaining_indicies.empty?
  remaining_indicies[0].product(*remaining_indicies.drop(1))
end

#mean(indicies = []) ⇒ Object

Compute the mean across a number of values.

Keyword arguments: indicies – tuple of fixed indicies over which to compute the mean, given from left to right. The remaining indicies are variable.



184
185
186
187
188
189
# File 'lib/kalibera/data.rb', line 184

def mean(indicies=[])
  remaining_indicies_cross_product =
      index_iterator(start=indicies.size)
  alldata = remaining_indicies_cross_product.map { |remaining| self[*(indicies + remaining)] }
  Kalibera.mean(alldata)
end

#nObject

The number of levels in the experiment.



164
165
166
# File 'lib/kalibera/data.rb', line 164

def n
  @reps.size
end

#optimalreps(i, costs) ⇒ Object

Computes the optimal number of repetitions for a given level.

Note that the resulting number of reps is not rounded.

Arguments: i – the mathematical level of which to compute optimal reps. costs – A list of costs for each level, high to low.



266
267
268
269
270
271
272
273
274
# File 'lib/kalibera/data.rb', line 266

def optimalreps(i, costs)
  # NOTE: Does not round
  costs = costs.map { |x| Float(x) }
  raise unless 1 <= i
  raise unless i < n
  index = n - i
  return (costs[index - 1] / costs[index] *
      Ti2(i) / Ti2(i + 1)) ** 0.5
end

#r(i) ⇒ Object

The number of repetitions for level i.

Arguments: i – mathematical index.



172
173
174
175
176
177
# File 'lib/kalibera/data.rb', line 172

def r(i)
  raise unless 1 <= i
  raise unless i <= n
  index = n - i
  @reps[index]
end

#random_measurement_sample(index = []) ⇒ Object



311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
# File 'lib/kalibera/data.rb', line 311

def random_measurement_sample(index=[])
  results = []
  if index.size == n
    results.push self[*index]
  else
    indicies = (0...@reps[index.size]).map { |i| rand(@reps[index.size]) }
    for single_index in indicies
      newindex = index + [single_index]
      for value in random_measurement_sample(newindex)
        results.push value
      end
    end
  end
  results
end

#Si2(i) ⇒ Object

Biased estimator S_i^2.

Arguments: i – the mathematical index of the level from which to compute S_i^2



197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
# File 'lib/kalibera/data.rb', line 197

def Si2(i)
  raise unless 1 <= i
  raise unless i <= n
  # @reps is indexed from the left to right
  index = n - i
  factor = 1.0

  # We compute this iteratively leveraging the fact that
  # 1 / (a * b) = (1 / a) / b
  for rep in @reps[0, index]
    factor /= rep
  end
  # Then at this point we have:
  # factor * (1 / (r_i - 1)) = factor / (r_i - 1)
  factor /=  @reps[index] - 1

  # Second line of the above definition, the lines are multiplied.
  indicies = index_iterator(0, index+1)
  sum = 0.0
  for index in indicies
    a = mean(index)
    b = mean(index[0,index.size-1])
    sum += (a - b) ** 2
  end
  factor * sum
end

#Ti2(i) ⇒ Object

Compute the unbiased T_i^2 variance estimator.

Arguments: i – the mathematical index from which to compute T_i^2.



230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
# File 'lib/kalibera/data.rb', line 230

def Ti2(i)
  # This is the broken implementation of T_i^2 shown in the pubslished
  # version of "Rigorous benchmarking in reasonable time". Tomas has
  # since fixed this in local versions of the paper.
  #@memoize
  #def broken_Ti2(self, i)
  #  """ Compute the unbiased T_i^2 variance estimator.
  #
  #  Arguments:
  #  i -- the mathematical index from which to compute T_i^2.
  #  """
  #
  #  raise unless 1 <= i <= n
  #  if i == 1:
  #    return self.Si2(1)
  #  return self.Si2(i) - self.Ti2(i - 1) / self.r(i - 1)

  # This is the correct definition of T_i^2

  raise unless 1 <= i
  raise unless i <= n
  if i == 1
    return Si2(1)
  end
  Si2(i) - Si2(i - 1) / r(i - 1)
end