Class: MiniHistogram

Inherits:
Object
  • Object
show all
Defined in:
lib/mini_histogram.rb,
lib/mini_histogram/plot.rb,
lib/mini_histogram/version.rb

Overview

Plots the histogram in unicode characters

Thanks to github.com/red-data-tools/unicode_plot.rb it could not be used because the dependency enumerable-statistics has a hard lock on a specific version of Ruby and this library needs to support older Rubies

Example:

require 'mini_histogram/plot'
array = 50.times.map { rand(11.2..11.6) }
histogram = MiniHistogram.new(array)
puts histogram.plot => Generates a plot

Defined Under Namespace

Classes: Error

Constant Summary collapse

INT64_MIN =
-9223372036854775808
INT64_MAX =
9223372036854775807
VERSION =
"0.3.1"

Instance Attribute Summary collapse

Class Method Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(array, left_p: true, edges: nil) ⇒ MiniHistogram

Returns a new instance of MiniHistogram.


24
25
26
27
28
29
30
31
# File 'lib/mini_histogram.rb', line 24

def initialize(array, left_p: true, edges: nil)
  @array = array
  @left_p = left_p
  @edges = edges
  @weights = nil

  @min, @max = array.minmax
end

Instance Attribute Details

#arrayObject (readonly)

Returns the value of attribute array


22
23
24
# File 'lib/mini_histogram.rb', line 22

def array
  @array
end

#left_pObject (readonly)

Returns the value of attribute left_p


22
23
24
# File 'lib/mini_histogram.rb', line 22

def left_p
  @left_p
end

#maxObject (readonly)

Returns the value of attribute max


22
23
24
# File 'lib/mini_histogram.rb', line 22

def max
  @max
end

Class Method Details

.dual_plot {|a, b| ... } ⇒ Object

Yields:

  • (a, b)

66
67
68
69
70
71
72
73
74
75
76
77
78
# File 'lib/mini_histogram/plot.rb', line 66

def self.dual_plot
  a = PlotValue.new
  b = PlotValue.new

  yield a, b

  if b.options[:ylabel] == a.options[:ylabel]
    b.options[:ylabel] = nil
  end

  MiniHistogram.set_average_edges!(a.histogram, b.histogram)
  PlotValue.dual_plot(a.plot, b.plot)
end

.set_average_edges!(*array_of_histograms) ⇒ Object

Given an array of Histograms this function calcualtes an average edge size along with the minimum and maximum edge values. It then updates the edge value on all inputs

The main pourpose of this method is to be able to chart multiple distributions against a similar axis

See for more context: github.com/schneems/derailed_benchmarks/pull/169


213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
# File 'lib/mini_histogram.rb', line 213

def self.set_average_edges!(*array_of_histograms)
  array_of_histograms.each { |x| raise "Input expected to be a histogram but is #{x.inspect}" unless x.is_a?(MiniHistogram) }
  steps = array_of_histograms.map(&:bin_size)
  avg_step_size = steps.inject(&:+).to_f / steps.length

  max_value = array_of_histograms.map(&:max).max

  max_edge = array_of_histograms.map(&:edges_max).max
  min_edge = array_of_histograms.map(&:edges_min).min

  average_edges = [min_edge]
  while average_edges.last < max_edge
    average_edges << average_edges.last + avg_step_size
  end

  array_of_histograms.each {|h| h.update_values(edges: average_edges, max: max_value) }

  return array_of_histograms
end

Instance Method Details

#bin_sizeObject


57
58
59
60
61
# File 'lib/mini_histogram.rb', line 57

def bin_size
  return 0 if edges.length <= 1

  edges[1] - edges[0]
end

#closedObject


45
46
47
# File 'lib/mini_histogram.rb', line 45

def closed
  @left_p ? :left : :right
end

#edgesObject Also known as: edge

Finds the “edges” of a given histogram that will mark the boundries for the histogram's “bins”

Example:

a = [1,1,1, 5, 5, 5, 5, 10, 10, 10]
MiniHistogram.new(a).edges
# => [0.0, 2.0, 4.0, 6.0, 8.0, 10.0, 12.0]

There are multiple ways to find edges, this was taken from
https://github.com/mrkn/enumerable-statistics/issues/24

Another good set of implementations is in numpy
https://github.com/numpy/numpy/blob/d9b1e32cb8ef90d6b4a47853241db2a28146a57d/numpy/lib/histograms.py#L222

122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
# File 'lib/mini_histogram.rb', line 122

def edges
  return @edges if @edges

  return @edges = [0.0] if array.empty?

  lo = @min
  hi = @max

  nbins = sturges.to_f

  if hi == lo
    start = lo
    step = 1.0
    divisor = 1.0
    len = 1
  else
    bw = (hi - lo) / nbins
    lbw = Math.log10(bw)
    if lbw >= 0
      step = 10 ** lbw.floor * 1.0
      r = bw/step

      if r <= 1.1
        # do nothing
      elsif r <= 2.2
        step *= 2.0
      elsif r <= 5.5
        step *= 5.0
      else
        step *= 10
      end
      divisor = 1.0
      start = step * (lo/step).floor
      len = ((hi - start)/step).ceil
    else
      divisor = 10 ** - lbw.floor
      r = bw * divisor
      if r <= 1.1
        # do nothing
      elsif r <= 2.2
        divisor /= 2.0
      elsif r <= 5.5
        divisor /= 5.0
      else
        divisor /= 10.0
      end
      step = 1.0
      start = (lo * divisor).floor
      len = (hi * divisor - start).ceil
    end
  end

  if left_p
    while (lo < start/divisor)
      start -= step
    end

    while (start + (len - 1)*step)/divisor <= hi
      len += 1
    end
  else
    while lo <= start/divisor
      start -= step
    end
    while (start + (len - 1)*step)/divisor < hi
      len += 1
    end
  end

  @edges = []
  len.times.each do
    @edges << start/divisor
    start += step
  end

  return @edges
end

#edges_maxObject


37
38
39
# File 'lib/mini_histogram.rb', line 37

def edges_max
  edges.max
end

#edges_minObject


33
34
35
# File 'lib/mini_histogram.rb', line 33

def edges_min
  edges.min
end

#histogram(*_) ⇒ Object


41
42
43
# File 'lib/mini_histogram.rb', line 41

def histogram(*_)
  self
end

#plot(nbins: nil, closed: :left, symbol: "▇", **kw) ⇒ Object


201
202
203
# File 'lib/mini_histogram.rb', line 201

def plot
  raise "You must `require 'mini_histogram/plot'` to get this feature"
end

#sturgesObject

Weird name, right? There are multiple ways to calculate the number of “bins” a histogram should have, one of the most common is the “sturges” method

Here are some alternatives from numpy: github.com/numpy/numpy/blob/d9b1e32cb8ef90d6b4a47853241db2a28146a57d/numpy/lib/histograms.py#L489-L521


69
70
71
72
73
74
75
# File 'lib/mini_histogram.rb', line 69

def sturges
  len = array.length
  return 1.0 if len == 0

  # return (long)(ceil(Math.log2(n)) + 1);
  return Math.log2(len).ceil + 1
end

#update_values(edges:, max:) ⇒ Object

Sets the edge value to something new, also clears any previously calculated values


51
52
53
54
55
# File 'lib/mini_histogram.rb', line 51

def update_values(edges:, max: )
  @edges = edges
  @max = max
  @weights = nil # clear memoized value
end

#weightsObject

Given an array of edges and an array we want to generate a histogram from return the counts for each “bin”

Example:

a = [1,1,1, 5, 5, 5, 5, 10, 10, 10]
edges = [0.0, 2.0, 4.0, 6.0, 8.0, 10.0, 12.0]

MiniHistogram.new(a).weights
# => [3, 0, 4, 0, 0, 3]

This means that the `a` array has 3 values between 0.0 and 2.0
4 values between 4.0 and 6.0 and three values between 10.0 and 12.0

90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
# File 'lib/mini_histogram.rb', line 90

def weights
  return @weights if @weights
  return @weights = [] if array.empty?

  lo = edges.first
  step = edges[1] - edges[0]

  max_index = ((@max  - lo) / step).floor
  @weights = Array.new(max_index + 1, 0)

  array.each do |x|
    index = ((x - lo) / step).floor
    @weights[index] += 1
  end

  return @weights
end