Class: HugeEnumerable
- Inherits:
-
Object
- Object
- HugeEnumerable
- Includes:
- Enumerable
- Defined in:
- lib/huge_enumerable.rb,
lib/huge_enumerable/version.rb
Overview
HugeEnumerable is a base class that allows for enumerations over very large (potentially infinite) data sets without requiring them to be in memory. In addition to enumerable, abilities it also allows for shuffling, sampling, shifting, and popping as if it were an array. These actions also do not require for the entire data set to be in memory. Nor do they alter the original data set in any fashion.
To use HugeEnumerable, inherit it via a subclass and provide the methods collection_size and fetch. collection_size should return the size of the full data set. fetch should return the value at the given index. It is guaranteed that fetch will always be called with values in the range of (0…collection_size) It will never be called with a negative index or with an index >= collection_size
Direct Known Subclasses
Constant Summary collapse
- DEFAULT_MAX_ARRAY_SIZE =
Currently 100,000 elements
100000
- VERSION =
“0.0.1”
"0.0.2"
Instance Attribute Summary collapse
-
#max_array_size ⇒ Object
:nodoc:.
-
#rng ⇒ Object
The random number generator to use for shuffles and samples.
Instance Method Summary collapse
-
#[](index_or_range, length = nil) ⇒ Object
Element Reference — Returns the element at index, or returns a subarray starting at the start index and continuing for length elements, or returns a subarray specified by range of indices.
-
#collection_each(&block) ⇒ Object
Calls the given block once for each element remaining in the collection, passing that element as a parameter.
-
#combination(n) ⇒ Object
When invoked with a block, yields all combinations of length n of elements from the collection and then returns the collection itself.
-
#each ⇒ Object
Calls the given block once for each element in the next array of the collection, passing that element as a parameter.
-
#empty? ⇒ Boolean
Returns true of the collection contains no more elements.
-
#initialize(max_array_size = nil, rng = nil) ⇒ HugeEnumerable
constructor
Create a new HugeEnumerable.
-
#next_array ⇒ Object
Shifts max_array_size elements and returns the following array from to_a.
-
#permutation(n) ⇒ Object
When invoked with a block, yields all permutations of length n of elements from the collection and then returns the collection itself.
-
#pop(n = nil) ⇒ Object
Removes the last element from the collection and returns it, or nil if the collection is empty.
-
#product(other_enumerable) ⇒ Object
When invoked with a block, yields all combinations of elements from the collection and the other enumerable and then returns the collection itself.
-
#sample(*args) ⇒ Object
Choose a random element or n random elements from the collection.
-
#shift(n = nil) ⇒ Object
Removes the first element of the collection and returns it (shifting all other elements down by one).
-
#shuffle(rng = nil) ⇒ Object
Returns a new HugeEnumerable with the order of the elements of the new collection randomized.
-
#shuffle!(rng = nil) ⇒ Object
Randomly reorders the elements of the collection.
-
#size ⇒ Object
Returns the current size of the collection.
Constructor Details
#initialize(max_array_size = nil, rng = nil) ⇒ HugeEnumerable
Create a new HugeEnumerable
Options
-
:max_array_size
- The default size of arrays when #to_a is called. -
:rng
- The random number generator to use.
40 41 42 43 44 45 46 |
# File 'lib/huge_enumerable.rb', line 40 def initialize(max_array_size = nil, rng = nil) @max_array_size = max_array_size ? max_array_size.to_i : nil @rng = rng || self.method(:rand) @collection_increment = 1 @start_of_sequence = 0 @shuffle_head = 0 end |
Instance Attribute Details
#max_array_size ⇒ Object
:nodoc:
29 30 31 |
# File 'lib/huge_enumerable.rb', line 29 def max_array_size @max_array_size end |
#rng ⇒ Object
The random number generator to use for shuffles and samples. Defaults to self#rand.
32 33 34 |
# File 'lib/huge_enumerable.rb', line 32 def rng @rng end |
Instance Method Details
#[](index_or_range, length = nil) ⇒ Object
Element Reference — Returns the element at index, or returns a subarray starting at the start index and continuing for length elements, or returns a subarray specified by range of indices. Negative indices count backward from the end of the collection (-1 is the last element). For start and range cases the starting index is just before an element. Additionally, an empty array is returned when the starting index for an element range is at the end of the collection. Returns nil if the index (or starting index) are out of range.
Attributes
-
index_or_range
- Either an integer for single element selection or length selection, or a range.
Options
-
:length
- The number of elements to return if index_or_range is not a range.
60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 |
# File 'lib/huge_enumerable.rb', line 60 def [](index_or_range, length=nil) # TODO: Consider changing this to return HugeCollection if index_or_range.is_a?(Range) range = index_or_range index = nil else index = index_or_range.to_i range = nil end if range index = range.first index += size if index < 0 length = range.last - index + 1 length += size if range.last < 0 length = size - index if index + length > size if index < 0 || index > size nil elsif length < 0 [] else element_or_array(length) { |i| _fetch(i + index) } end elsif length index += size if index < 0 length = size - index if index + length > size if index < 0 || length < 0 nil else element_or_array(length) { |i| _fetch(i + index) } end else _fetch(index) end end |
#collection_each(&block) ⇒ Object
Calls the given block once for each element remaining in the collection, passing that element as a parameter.
100 101 102 103 |
# File 'lib/huge_enumerable.rb', line 100 def collection_each(&block) # :yields: element # TODO: Return an Enumerator if no block is given size.times { |i| yield _fetch(i) } end |
#combination(n) ⇒ Object
When invoked with a block, yields all combinations of length n of elements from the collection and then returns the collection itself. If no block is given, an HugeCombination is returned instead.
Caveat
max_array_size is currently inherited by the generated HugeCombination. This may change in the future.
109 110 111 112 113 114 115 116 117 118 |
# File 'lib/huge_enumerable.rb', line 109 def combination(n) # :yields: element random_number_generator = rng != self.method(:rand) ? rng : nil combo = HugeCombination.new(self.dup.reset!, n, max_array_size, random_number_generator) if block_given? combo.each { |x| yield x } self else combo end end |
#each ⇒ Object
Calls the given block once for each element in the next array of the collection, passing that element as a parameter.
121 122 123 124 |
# File 'lib/huge_enumerable.rb', line 121 def each # :yields: element # TODO: Return an Enumerator if no block is given remaining_or(max_array_size).times { |i| yield _fetch(i) } end |
#empty? ⇒ Boolean
Returns true of the collection contains no more elements.
137 138 139 |
# File 'lib/huge_enumerable.rb', line 137 def empty? @start_of_sequence == @end_of_sequence end |
#next_array ⇒ Object
Shifts max_array_size elements and returns the following array from to_a.
131 132 133 134 |
# File 'lib/huge_enumerable.rb', line 131 def next_array shift(max_array_size) to_a end |
#permutation(n) ⇒ Object
When invoked with a block, yields all permutations of length n of elements from the collection and then returns the collection itself. If no block is given, a HugePermutation is returned instead.
Caveat
max_array_size is currently inherited by the generated HugePermutation. This may change in the future.
145 146 147 148 149 150 151 152 153 154 |
# File 'lib/huge_enumerable.rb', line 145 def permutation(n) # :yields: element random_number_generator = rng != self.method(:rand) ? rng : nil perm = HugePermutation.new(self.dup.reset!, n, max_array_size, random_number_generator) if block_given? perm.each { |x| yield x } self else perm end end |
#pop(n = nil) ⇒ Object
Removes the last element from the collection and returns it, or nil if the collection is empty. If a number n is given, returns an array of the last n elements (or less).
158 159 160 161 |
# File 'lib/huge_enumerable.rb', line 158 def pop(n = nil) result = element_or_array(n) { pop1 } n ? result.reverse : result end |
#product(other_enumerable) ⇒ Object
When invoked with a block, yields all combinations of elements from the collection and the other enumerable and then returns the collection itself. If no block is given, a HugeProduct is returned instead.
Caveat
max_array_size is currently inherited by the generated HugeProduct. This may change in the future. other_enumerable is duped and reset if it is a HugeEnumerable. This may change in the future.
168 169 170 171 172 173 174 175 176 177 178 |
# File 'lib/huge_enumerable.rb', line 168 def product(other_enumerable) # :yields: element other_enumerable = other_enumerable.dup.reset! if other_enumerable.is_a?(HugeEnumerable) random_number_generator = rng != self.method(:rand) ? rng : nil prod = HugeProduct.new(self.dup.reset!, other_enumerable, max_array_size, random_number_generator) if block_given? prod.each { |x| yield x } self else prod end end |
#sample(*args) ⇒ Object
Choose a random element or n random elements from the collection. The elements are chosen by using random and unique indices into the array in order to ensure that an element does not repeat itself unless the collection already contained duplicate elements. If the collection is empty the first form returns nil and the second form returns an empty array. The optional rng argument will be used as the random number generator.
185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 |
# File 'lib/huge_enumerable.rb', line 185 def sample(*args) if args.size > 2 raise ArgumentError, "wrong number of arguments (#{args.size} for 2)" elsif args.size == 2 n = args.first rng = args.last elsif args.size == 1 arg = args.first if arg.is_a?(Proc) || arg.is_a?(Method) n = 1 rng = arg else n = arg rng = method(:rand) end else n = nil rng = method(:rand) end element_or_array(n) { sample1(rng) } end |
#shift(n = nil) ⇒ Object
Removes the first element of the collection and returns it (shifting all other elements down by one). Returns nil if the collection is empty. If a number n is given, returns an array of the first n elements (or less). With collection containing only the remainder elements, not including what was shifted to returned array.
Options
-
rng
- The random number generator to use. Defaults to self#rng.
214 215 216 |
# File 'lib/huge_enumerable.rb', line 214 def shift(n = nil) element_or_array(n) { shift1 } end |
#shuffle(rng = nil) ⇒ Object
Returns a new HugeEnumerable with the order of the elements of the new collection randomized.
Options
-
rng
- The random number generator to use. Defaults to self#rng.
Side Effects
The new collection is reset to the current collection’s original size and elements before shuffling.
223 224 225 |
# File 'lib/huge_enumerable.rb', line 223 def shuffle(rng=nil) self.dup.shuffle!(rng) end |
#shuffle!(rng = nil) ⇒ Object
Randomly reorders the elements of the collection.
Options
-
rng
- The random number generator to use. Defaults to self#rng.
Side Effects
The collection is reset to its original size and elements before shuffling
232 233 234 235 236 237 238 |
# File 'lib/huge_enumerable.rb', line 232 def shuffle!(rng=nil) rng ||= self.rng reset! @shuffle_head = rng.call(collection_size) @collection_increment = full_cycle_increment(collection_size) self end |
#size ⇒ Object
Returns the current size of the collection. Unlike collection_size, this tracks size changes caused by push, pop, shift, and next_array.
242 243 244 |
# File 'lib/huge_enumerable.rb', line 242 def size end_of_sequence - start_of_sequence end |