Class: MiniHistogram

Inherits:
Object
  • Object
show all
Defined in:
lib/mini_histogram.rb,
lib/mini_histogram/version.rb

Overview

A class for building histogram info

Given an array, this class calculates the “edges” of a histogram these edges mark the boundries for “bins”

array = [1,1,1, 5, 5, 5, 5, 10, 10, 10]
histogram = MiniHistogram.new(array)
puts histogram.edges
# => [0.0, 2.0, 4.0, 6.0, 8.0, 10.0, 12.0]

It also finds the weights (aka count of values) that would go in each bin:

puts histogram.weights
# => [3, 0, 4, 0, 0, 3]

This means that the `array` here had three items between 0.0 and 2.0.

Defined Under Namespace

Classes: Error

Constant Summary collapse

VERSION =
"0.1.0"

Instance Attribute Summary collapse

Class Method Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(array, left_p: false, edges: nil) ⇒ MiniHistogram

Returns a new instance of MiniHistogram.


25
26
27
28
29
30
# File 'lib/mini_histogram.rb', line 25

def initialize(array, left_p: false, edges: nil)
  @array = array
  @left_p = left_p
  @edges = edges
  @weights = nil
end

Instance Attribute Details

#arrayObject (readonly)

Returns the value of attribute array


23
24
25
# File 'lib/mini_histogram.rb', line 23

def array
  @array
end

#left_pObject (readonly)

Returns the value of attribute left_p


23
24
25
# File 'lib/mini_histogram.rb', line 23

def left_p
  @left_p
end

Class Method Details

.set_average_edges!(*array_of_histograms) ⇒ Object

Given an array of Histograms this function calcualtes an average edge size along with the minimum and maximum edge values. It then updates the edge value on all inputs

The main pourpose of this method is to be able to chart multiple distributions against a similar axis

See for more context: github.com/schneems/derailed_benchmarks/pull/169


192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
# File 'lib/mini_histogram.rb', line 192

def self.set_average_edges!(*array_of_histograms)
  array_of_histograms.each { |x| raise "Input expected to be a histogram but is #{x.inspect}" unless x.is_a?(MiniHistogram) }
  steps = array_of_histograms.map(&:bin_size)
  avg_step_size = steps.sum.to_f / steps.length

  max_edge = array_of_histograms.map(&:edges_max).max
  min_edge = array_of_histograms.map(&:edges_min).min

  average_edges = [min_edge]
  while average_edges.last < max_edge
    average_edges << average_edges.last + avg_step_size
  end

  array_of_histograms.each {|h| h.set_edges(average_edges) }

  return array_of_histograms
end

Instance Method Details

#bin_sizeObject


47
48
49
# File 'lib/mini_histogram.rb', line 47

def bin_size
  edges[1] - edges[0]
end

#edgesObject

Finds the “edges” of a given histogram that will mark the boundries for the histogram's “bins”

Example:

a = [1,1,1, 5, 5, 5, 5, 10, 10, 10]
MiniHistogram.new(a).edges
# => [0.0, 2.0, 4.0, 6.0, 8.0, 10.0, 12.0]

There are multiple ways to find edges, this was taken from
https://github.com/mrkn/enumerable-statistics/issues/24

Another good set of implementations is in numpy
https://github.com/numpy/numpy/blob/d9b1e32cb8ef90d6b4a47853241db2a28146a57d/numpy/lib/histograms.py#L222

109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
# File 'lib/mini_histogram.rb', line 109

def edges
  return @edges if @edges

  hi = array.max
  lo = array.min

  nbins = sturges * 1.0

  if hi == lo
    start = hi
    step = 1.0
    divisor = 1.0
    len = 1.0
  else
    bw = (hi - lo) / nbins
    lbw = Math.log10(bw)
    if lbw >= 0
      step = 10 ** lbw.floor * 1.0
      r = bw/step

      if r <= 1.1
        # do nothing
      elsif r <= 2.2
        step *= 2.0
      elsif r <= 5.5
        step *= 5.0
      else
        step *= 10
      end
      divisor = 1.0
      start = step * (lo/step).floor
      len = ((hi - start)/step).ceil
    else
      divisor = 10 ** - lbw.floor
      r = bw * divisor
      if r <= 1.1
        # do nothing
      elsif r <= 2.2
        divisor /= 2.0
      elsif r <= 5.5
        divisor /= 5.0
      else
        divisor /= 10.0
      end
      step = 1.0
      start = (lo * divisor).floor
      len = (hi * divisor - start).ceil
    end

    if left_p
      while (lo < start/divisor)
        start -= step
      end

      while (start + (len - 1)*step)/divisor <= hi
        len += 1
      end
    else
      while lo <= start/divisor
        start -= step
      end
      while (start + (len - 1)*step)/divisor < hi
        len += 1
      end
    end

    @edges = []
    len.next.times.each do
      @edges << start/divisor
      start += step
    end
    return @edges
  end
end

#edges_maxObject


36
37
38
# File 'lib/mini_histogram.rb', line 36

def edges_max
  edges.max
end

#edges_minObject


32
33
34
# File 'lib/mini_histogram.rb', line 32

def edges_min
  edges.min
end

#set_edges(value) ⇒ Object

Sets the edge value to something new, also clears any previously calculated values


42
43
44
45
# File 'lib/mini_histogram.rb', line 42

def set_edges(value)
  @edges = value
  @weights = nil # clear memoized value
end

#sturgesObject

Weird name, right? There are multiple ways to calculate the number of “bins” a histogram should have, one of the most common is the “sturges” method

Here are some alternatives from numpy: github.com/numpy/numpy/blob/d9b1e32cb8ef90d6b4a47853241db2a28146a57d/numpy/lib/histograms.py#L489-L521


57
58
59
60
61
62
63
# File 'lib/mini_histogram.rb', line 57

def sturges
  len = array.length
  return 1.0 if len == 0

  # return (long)(ceil(Math.log2(n)) + 1);
  return Math.log2(len).ceil + 1
end

#weightsObject

Given an array of edges and an array we want to generate a histogram from return the counts for each “bin”

Example:

a = [1,1,1, 5, 5, 5, 5, 10, 10, 10]
edges = [0.0, 2.0, 4.0, 6.0, 8.0, 10.0, 12.0]

MiniHistogram.new(a).weights
# => [3, 0, 4, 0, 0, 3]

This means that the `a` array has 3 values between 0.0 and 2.0
4 values between 4.0 and 6.0 and three values between 10.0 and 12.0

78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
# File 'lib/mini_histogram.rb', line 78

def weights
  return @weights if @weights

  lo = edges.first
  step = edges[1] - edges[0]

  max_index = ((array.max  - lo) / step).floor
  @weights = Array.new(max_index + 1, 0)

  array.each do |x|
    index = ((x - lo) / step).floor
    @weights[index] += 1
  end

  return @weights
end