Class: HugeEnumerable

Inherits:
Object
  • Object
show all
Includes:
Enumerable
Defined in:
lib/huge_enumerable.rb,
lib/huge_enumerable/version.rb

Overview

HugeEnumerable is a base class that allows for enumerations over very large (potentially infinite) data sets without requiring them to be in memory. In addition to enumerable, abilities it also allows for shuffling, sampling, shifting, and popping as if it were an array. These actions also do not require for the entire data set to be in memory. Nor do they alter the original data set in any fashion.

To use HugeEnumerable, inherit it via a subclass and provide the methods collection_size and fetch. collection_size should return the size of the full data set. fetch should return the value at the given index. It is guaranteed that fetch will always be called with values in the range of (0…collection_size) It will never be called with a negative index or with an index >= collection_size

Direct Known Subclasses

HugeCollection, HugeProduct

Constant Summary collapse

DEFAULT_MAX_ARRAY_SIZE =

Currently 100,000 elements

100000
VERSION =

“0.0.1”

"0.0.2"

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(max_array_size = nil, rng = nil) ⇒ HugeEnumerable

Create a new HugeEnumerable

Options

  • :max_array_size - The default size of arrays when #to_a is called.

  • :rng - The random number generator to use.



40
41
42
43
44
45
46
# File 'lib/huge_enumerable.rb', line 40

def initialize(max_array_size = nil, rng = nil)
  @max_array_size = max_array_size ? max_array_size.to_i : nil
  @rng = rng || self.method(:rand)
  @collection_increment = 1
  @start_of_sequence = 0
  @shuffle_head = 0
end

Instance Attribute Details

#max_array_sizeObject

:nodoc:



29
30
31
# File 'lib/huge_enumerable.rb', line 29

def max_array_size
  @max_array_size
end

#rngObject

The random number generator to use for shuffles and samples. Defaults to self#rand.



32
33
34
# File 'lib/huge_enumerable.rb', line 32

def rng
  @rng
end

Instance Method Details

#[](index_or_range, length = nil) ⇒ Object

Element Reference — Returns the element at index, or returns a subarray starting at the start index and continuing for length elements, or returns a subarray specified by range of indices. Negative indices count backward from the end of the collection (-1 is the last element). For start and range cases the starting index is just before an element. Additionally, an empty array is returned when the starting index for an element range is at the end of the collection. Returns nil if the index (or starting index) are out of range.

Attributes

  • index_or_range - Either an integer for single element selection or length selection, or a range.

Options

  • :length - The number of elements to return if index_or_range is not a range.



60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
# File 'lib/huge_enumerable.rb', line 60

def [](index_or_range, length=nil)
  # TODO: Consider changing this to return HugeCollection
  if index_or_range.is_a?(Range)
    range = index_or_range
    index = nil
  else
    index = index_or_range.to_i
    range = nil
  end

  if range
    index = range.first
    index += size if index < 0

    length = range.last - index + 1
    length += size if range.last < 0
    length = size - index if index + length > size

    if index < 0 || index > size
      nil
    elsif length < 0
      []
    else
      element_or_array(length) { |i| _fetch(i + index) }
    end
  elsif length
    index += size if index < 0
    length = size - index if index + length > size
    if index < 0 || length < 0
      nil
    else
      element_or_array(length) { |i| _fetch(i + index) }
    end
  else
    _fetch(index)
  end

end

#collection_each(&block) ⇒ Object

Calls the given block once for each element remaining in the collection, passing that element as a parameter.



100
101
102
103
# File 'lib/huge_enumerable.rb', line 100

def collection_each(&block) # :yields: element
  # TODO: Return an Enumerator if no block is given
  size.times { |i| yield _fetch(i) }
end

#combination(n) ⇒ Object

When invoked with a block, yields all combinations of length n of elements from the collection and then returns the collection itself. If no block is given, an HugeCombination is returned instead.

Caveat

max_array_size is currently inherited by the generated HugeCombination. This may change in the future.



109
110
111
112
113
114
115
116
117
118
# File 'lib/huge_enumerable.rb', line 109

def combination(n) # :yields: element
  random_number_generator = rng != self.method(:rand) ? rng : nil
  combo = HugeCombination.new(self.dup.reset!, n, max_array_size, random_number_generator)
  if block_given?
    combo.each { |x| yield x }
    self
  else
    combo
  end
end

#eachObject

Calls the given block once for each element in the next array of the collection, passing that element as a parameter.



121
122
123
124
# File 'lib/huge_enumerable.rb', line 121

def each # :yields: element
  # TODO: Return an Enumerator if no block is given
  remaining_or(max_array_size).times { |i| yield _fetch(i) }
end

#empty?Boolean

Returns true of the collection contains no more elements.

Returns:

  • (Boolean)


137
138
139
# File 'lib/huge_enumerable.rb', line 137

def empty?
  @start_of_sequence == @end_of_sequence
end

#next_arrayObject

Shifts max_array_size elements and returns the following array from to_a.



131
132
133
134
# File 'lib/huge_enumerable.rb', line 131

def next_array
  shift(max_array_size)
  to_a
end

#permutation(n) ⇒ Object

When invoked with a block, yields all permutations of length n of elements from the collection and then returns the collection itself. If no block is given, a HugePermutation is returned instead.

Caveat

max_array_size is currently inherited by the generated HugePermutation. This may change in the future.



145
146
147
148
149
150
151
152
153
154
# File 'lib/huge_enumerable.rb', line 145

def permutation(n) # :yields: element
  random_number_generator = rng != self.method(:rand) ? rng : nil
  perm = HugePermutation.new(self.dup.reset!, n, max_array_size, random_number_generator)
  if block_given?
    perm.each { |x| yield x }
    self
  else
    perm
  end
end

#pop(n = nil) ⇒ Object

Removes the last element from the collection and returns it, or nil if the collection is empty. If a number n is given, returns an array of the last n elements (or less).



158
159
160
161
# File 'lib/huge_enumerable.rb', line 158

def pop(n = nil)
  result = element_or_array(n) { pop1 }
  n  ? result.reverse : result
end

#product(other_enumerable) ⇒ Object

When invoked with a block, yields all combinations of elements from the collection and the other enumerable and then returns the collection itself. If no block is given, a HugeProduct is returned instead.

Caveat

max_array_size is currently inherited by the generated HugeProduct. This may change in the future. other_enumerable is duped and reset if it is a HugeEnumerable. This may change in the future.



168
169
170
171
172
173
174
175
176
177
178
# File 'lib/huge_enumerable.rb', line 168

def product(other_enumerable) # :yields: element
  other_enumerable = other_enumerable.dup.reset! if other_enumerable.is_a?(HugeEnumerable)
  random_number_generator = rng != self.method(:rand) ? rng : nil
  prod = HugeProduct.new(self.dup.reset!, other_enumerable, max_array_size, random_number_generator)
  if block_given?
    prod.each { |x| yield x }
    self
  else
    prod
  end
end

#sample(*args) ⇒ Object

Choose a random element or n random elements from the collection. The elements are chosen by using random and unique indices into the array in order to ensure that an element does not repeat itself unless the collection already contained duplicate elements. If the collection is empty the first form returns nil and the second form returns an empty array. The optional rng argument will be used as the random number generator.



185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
# File 'lib/huge_enumerable.rb', line 185

def sample(*args)
  if args.size > 2
    raise ArgumentError, "wrong number of arguments (#{args.size} for 2)"
  elsif args.size == 2
    n = args.first
    rng = args.last
  elsif args.size == 1
    arg = args.first
    if arg.is_a?(Proc) || arg.is_a?(Method)
      n = 1
      rng = arg
    else
      n = arg
      rng = method(:rand)
    end
  else
    n = nil
    rng = method(:rand)
  end

  element_or_array(n) { sample1(rng) }
end

#shift(n = nil) ⇒ Object

Removes the first element of the collection and returns it (shifting all other elements down by one). Returns nil if the collection is empty. If a number n is given, returns an array of the first n elements (or less). With collection containing only the remainder elements, not including what was shifted to returned array.

Options

  • rng - The random number generator to use. Defaults to self#rng.



214
215
216
# File 'lib/huge_enumerable.rb', line 214

def shift(n = nil)
  element_or_array(n) { shift1 }
end

#shuffle(rng = nil) ⇒ Object

Returns a new HugeEnumerable with the order of the elements of the new collection randomized.

Options

  • rng - The random number generator to use. Defaults to self#rng.

Side Effects

The new collection is reset to the current collection’s original size and elements before shuffling.



223
224
225
# File 'lib/huge_enumerable.rb', line 223

def shuffle(rng=nil)
  self.dup.shuffle!(rng)
end

#shuffle!(rng = nil) ⇒ Object

Randomly reorders the elements of the collection.

Options

  • rng - The random number generator to use. Defaults to self#rng.

Side Effects

The collection is reset to its original size and elements before shuffling



232
233
234
235
236
237
238
# File 'lib/huge_enumerable.rb', line 232

def shuffle!(rng=nil)
  rng ||= self.rng
  reset!
  @shuffle_head = rng.call(collection_size)
  @collection_increment = full_cycle_increment(collection_size)
  self
end

#sizeObject

Returns the current size of the collection. Unlike collection_size, this tracks size changes caused by push, pop, shift, and next_array.



242
243
244
# File 'lib/huge_enumerable.rb', line 242

def size
  end_of_sequence - start_of_sequence
end