Class: HugeEnumerable

Inherits:
Object
  • Object
show all
Includes:
Enumerable
Defined in:
lib/huge_enumerable.rb,
lib/huge_enumerable/version.rb

Overview

HugeEnumerable is a base class that allows for enumerations over very large (potentially infinite) data sets without requiring them to be in memory. In addition to enumerable, abilities it also allows for shuffling, sampling, shifting, and popping as if it were an array. These actions also do not require for the entire data set to be in memory. Nor do they alter the original data set in any fashion.

To use HugeEnumerable, inherit it via a subclass and provide the methods collection_size and fetch. collection_size should return the size of the full data set. fetch should return the value at the given index. It is guaranteed that fetch will always be called with values in the range of (0…collection_size) It will never be called with a negative index or with an index >= collection_size

Direct Known Subclasses

HugeCollection, HugeProduct

Constant Summary collapse

DEFAULT_MAX_ARRAY_SIZE =

Currently 100,000 elements

100000
VERSION =
"0.1.3"

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(max_array_size = nil, rng = nil) ⇒ HugeEnumerable

Create a new HugeEnumerable

Options

  • :max_array_size - The default size of arrays when #to_a is called.

  • :rng - The random number generator to use.



38
39
40
41
42
43
44
# File 'lib/huge_enumerable.rb', line 38

def initialize(max_array_size = nil, rng = nil)
  @max_array_size = max_array_size ? max_array_size.to_i : nil
  @rng = rng || self.method(:rand)
  @collection_increment = 1
  @start_of_sequence = 0
  @shuffle_head = 0
end

Instance Attribute Details

#max_array_sizeObject

:nodoc:



27
28
29
# File 'lib/huge_enumerable.rb', line 27

def max_array_size
  @max_array_size
end

#rngObject

The random number generator to use for shuffles and samples. Defaults to self#rand.



30
31
32
# File 'lib/huge_enumerable.rb', line 30

def rng
  @rng
end

Instance Method Details

#[](index_or_range, length = nil) ⇒ Object

Element Reference — Returns the element at index, or returns a subarray starting at the start index and continuing for length elements, or returns a subarray specified by range of indices. Negative indices count backward from the end of the collection (-1 is the last element). For start and range cases the starting index is just before an element. Additionally, an empty array is returned when the starting index for an element range is at the end of the collection. Returns nil if the index (or starting index) are out of range.

Attributes

  • index_or_range - Either an integer for single element selection or length selection, or a range.

Options

  • :length - The number of elements to return if index_or_range is not a range.



58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
# File 'lib/huge_enumerable.rb', line 58

def [](index_or_range, length=nil)
  # TODO: Consider changing this to return HugeCollection
  if index_or_range.is_a?(Range)
    range = index_or_range
    index = nil
  else
    index = index_or_range.to_i
    range = nil
  end

  if range
    index = range.first
    index += size if index < 0

    length = range.last - index + 1
    length += size if range.last < 0
    length = size - index if index + length > size

    if index < 0 || index > size
      nil
    elsif length < 0
      []
    else
      element_or_array(length) { |i| _fetch(i + index) }
    end
  elsif length
    index += size if index < 0
    length = size - index if index + length > size
    if index < 0 || length < 0
      nil
    else
      element_or_array(length) { |i| _fetch(i + index) }
    end
  else
    _fetch(index)
  end

end

#collection_each(&block) ⇒ Object

Calls the given block once for each element remaining in the collection, passing that element as a parameter.



98
99
100
101
# File 'lib/huge_enumerable.rb', line 98

def collection_each(&block) # :yields: element
  # TODO: Return an Enumerator if no block is given
  size.times { |i| yield _fetch(i) }
end

#combination(n, &block) ⇒ Object

When invoked with a block, yields all combinations of length n of elements from the collection and then returns the collection itself. If no block is given, an HugeCombination is returned instead.

Caveat

max_array_size is currently inherited by the generated HugeCombination. This may change in the future.



107
108
109
110
111
112
113
114
115
116
117
118
# File 'lib/huge_enumerable.rb', line 107

def combination(n, &block) # :yields: element
  # Check to see if we have a specific random number generator to use.
  # Using hash comparison as dups, clones, and other actions can make == and eql? return false when it is actually the same method
  random_number_generator = rng.hash != method(:rand).hash ? rng : nil
  combo = HugeCombination.new(self.clone.reset!, n, max_array_size, random_number_generator)
  if block
    combo.each(&block)
    self
  else
    combo
  end
end

#each(&block) ⇒ Object

Calls the given block once for each element in the next array of the collection, passing that element as a parameter.



121
122
123
124
125
# File 'lib/huge_enumerable.rb', line 121

def each(&block) # :yields: element
  # TODO: Return an Enumerator if no block is given
  remaining_or(max_array_size).times(&(block << method(:_fetch)))
  # remaining_or(max_array_size).times { |i| yield _fetch(i) }
end

#empty?Boolean

Returns true of the collection contains no more elements.

Returns:

  • (Boolean)


143
144
145
# File 'lib/huge_enumerable.rb', line 143

def empty?
  @start_of_sequence == @end_of_sequence
end

#initialize_copy(orig) ⇒ Object



127
128
129
130
# File 'lib/huge_enumerable.rb', line 127

def initialize_copy(orig)
  super
  @rng = @rng.unbind.bind(self) if @rng.respond_to?(:unbind) # Make sure this is bound to self if it is a method
end

#next_arrayObject

Shifts max_array_size elements and returns the following array from to_a.



137
138
139
140
# File 'lib/huge_enumerable.rb', line 137

def next_array
  shift(max_array_size)
  to_a
end

#permutation(n, &block) ⇒ Object

When invoked with a block, yields all permutations of length n of elements from the collection and then returns the collection itself. If no block is given, a HugePermutation is returned instead.

Caveat

max_array_size is currently inherited by the generated HugePermutation. This may change in the future.



151
152
153
154
155
156
157
158
159
160
161
162
# File 'lib/huge_enumerable.rb', line 151

def permutation(n, &block) # :yields: element
  # Check to see if we have a specific random number generator to use.
  # Using hash comparison as dups, clones, and other actions can make == and eql? return false when it is actually the same method
  random_number_generator = rng.hash != method(:rand).hash ? rng : nil
  perm = HugePermutation.new(self.clone.reset!, n, max_array_size, random_number_generator)
  if block
    perm.each(&block)
    self
  else
    perm
  end
end

#pop(n = nil) ⇒ Object

Removes the last element from the collection and returns it, or nil if the collection is empty. If a number n is given, returns an array of the last n elements (or less).



166
167
168
169
# File 'lib/huge_enumerable.rb', line 166

def pop(n = nil)
  result = element_or_array(n) { pop1 }
  n  ? result.reverse : result
end

#product(other_enumerable, &block) ⇒ Object

When invoked with a block, yields all combinations of elements from the collection and the other enumerable and then returns the collection itself. If no block is given, a HugeProduct is returned instead.

Caveat

max_array_size is currently inherited by the generated HugeProduct. This may change in the future. other_enumerable is duped and reset if it is a HugeEnumerable. This may change in the future.



176
177
178
179
180
181
182
183
184
185
186
187
188
# File 'lib/huge_enumerable.rb', line 176

def product(other_enumerable, &block) # :yields: element
  other_enumerable = other_enumerable.clone.reset! if other_enumerable.is_a?(HugeEnumerable)
  # Check to see if we have a specific random number generator to use.
  # Using hash comparison as dups, clones, and other actions can make == and eql? return false when it is actually the same method
  random_number_generator = rng.hash != method(:rand).hash ? rng : nil
  prod = HugeProduct.new(self.clone.reset!, other_enumerable, max_array_size, random_number_generator)
  if block
    prod.each(&block)
    self
  else
    prod
  end
end

#sample(*args) ⇒ Object

Choose a random element or n random elements from the collection. The elements are chosen by using random and unique indices into the array in order to ensure that an element does not repeat itself unless the collection already contained duplicate elements. If the collection is empty the first form returns nil and the second form returns an empty array. The optional rng argument will be used as the random number generator.



195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
# File 'lib/huge_enumerable.rb', line 195

def sample(*args)
  if args.size > 2
    raise ArgumentError, "wrong number of arguments (#{args.size} for 2)"
  elsif args.size == 2
    n = args.first
    rng = args.last
  elsif args.size == 1
    arg = args.first
    if arg.is_a?(Proc) || arg.is_a?(Method)
      n = 1
      rng = arg
    else
      n = arg
      rng = method(:rand)
    end
  else
    n = nil
    rng = method(:rand)
  end

  element_or_array(n) { sample1(rng) }
end

#shift(n = nil) ⇒ Object

Removes the first element of the collection and returns it (shifting all other elements down by one). Returns nil if the collection is empty. If a number n is given, returns an array of the first n elements (or less). With collection containing only the remainder elements, not including what was shifted to returned array.

Options

  • rng - The random number generator to use. Defaults to self#rng.



224
225
226
# File 'lib/huge_enumerable.rb', line 224

def shift(n = nil)
  element_or_array(n) { shift1 }
end

#shuffle(rng = nil) ⇒ Object

Returns a new HugeEnumerable with the order of the elements of the new collection randomized.

Options

  • rng - The random number generator to use. Defaults to self#rng.

Side Effects

The new collection is reset to the current collection’s original size and elements before shuffling.



233
234
235
# File 'lib/huge_enumerable.rb', line 233

def shuffle(rng=nil)
  self.clone.shuffle!(rng)
end

#shuffle!(rng = nil) ⇒ Object

Randomly reorders the elements of the collection.

Options

  • rng - The random number generator to use. Defaults to self#rng.

Side Effects

The collection is reset to its original size and elements before shuffling



242
243
244
245
246
247
248
# File 'lib/huge_enumerable.rb', line 242

def shuffle!(rng=nil)
  rng ||= self.rng
  reset!
  @shuffle_head = rng.call(collection_size)
  @collection_increment = full_cycle_increment(collection_size)
  self
end

#sizeObject

Returns the current size of the collection. Unlike collection_size, this tracks size changes caused by push, pop, shift, and next_array.



252
253
254
# File 'lib/huge_enumerable.rb', line 252

def size
  end_of_sequence - start_of_sequence
end