Class: HugeEnumerable
- Inherits:
-
Object
- Object
- HugeEnumerable
- Includes:
- Enumerable
- Defined in:
- lib/huge_enumerable.rb,
lib/huge_enumerable/version.rb
Overview
HugeEnumerable is a base class that allows for enumerations over very large (potentially infinite) data sets without requiring them to be in memory. In addition to enumerable, abilities it also allows for shuffling, sampling, shifting, and popping as if it were an array. These actions also do not require for the entire data set to be in memory. Nor do they alter the original data set in any fashion.
To use HugeEnumerable, inherit it via a subclass and provide the methods collection_size and fetch. collection_size should return the size of the full data set. fetch should return the value at the given index. It is guaranteed that fetch will always be called with values in the range of (0…collection_size) It will never be called with a negative index or with an index >= collection_size
Direct Known Subclasses
Constant Summary collapse
- DEFAULT_MAX_ARRAY_SIZE =
Currently 100,000 elements
100000
- VERSION =
"0.1.3"
Instance Attribute Summary collapse
-
#max_array_size ⇒ Object
:nodoc:.
-
#rng ⇒ Object
The random number generator to use for shuffles and samples.
Instance Method Summary collapse
-
#[](index_or_range, length = nil) ⇒ Object
Element Reference — Returns the element at index, or returns a subarray starting at the start index and continuing for length elements, or returns a subarray specified by range of indices.
-
#collection_each(&block) ⇒ Object
Calls the given block once for each element remaining in the collection, passing that element as a parameter.
-
#combination(n, &block) ⇒ Object
When invoked with a block, yields all combinations of length n of elements from the collection and then returns the collection itself.
-
#each(&block) ⇒ Object
Calls the given block once for each element in the next array of the collection, passing that element as a parameter.
-
#empty? ⇒ Boolean
Returns true of the collection contains no more elements.
-
#initialize(max_array_size = nil, rng = nil) ⇒ HugeEnumerable
constructor
Create a new HugeEnumerable.
- #initialize_copy(orig) ⇒ Object
-
#next_array ⇒ Object
Shifts max_array_size elements and returns the following array from to_a.
-
#permutation(n, &block) ⇒ Object
When invoked with a block, yields all permutations of length n of elements from the collection and then returns the collection itself.
-
#pop(n = nil) ⇒ Object
Removes the last element from the collection and returns it, or nil if the collection is empty.
-
#product(other_enumerable, &block) ⇒ Object
When invoked with a block, yields all combinations of elements from the collection and the other enumerable and then returns the collection itself.
-
#sample(*args) ⇒ Object
Choose a random element or n random elements from the collection.
-
#shift(n = nil) ⇒ Object
Removes the first element of the collection and returns it (shifting all other elements down by one).
-
#shuffle(rng = nil) ⇒ Object
Returns a new HugeEnumerable with the order of the elements of the new collection randomized.
-
#shuffle!(rng = nil) ⇒ Object
Randomly reorders the elements of the collection.
-
#size ⇒ Object
Returns the current size of the collection.
Constructor Details
#initialize(max_array_size = nil, rng = nil) ⇒ HugeEnumerable
Create a new HugeEnumerable
Options
-
:max_array_size
- The default size of arrays when #to_a is called. -
:rng
- The random number generator to use.
38 39 40 41 42 43 44 |
# File 'lib/huge_enumerable.rb', line 38 def initialize(max_array_size = nil, rng = nil) @max_array_size = max_array_size ? max_array_size.to_i : nil @rng = rng || self.method(:rand) @collection_increment = 1 @start_of_sequence = 0 @shuffle_head = 0 end |
Instance Attribute Details
#max_array_size ⇒ Object
:nodoc:
27 28 29 |
# File 'lib/huge_enumerable.rb', line 27 def max_array_size @max_array_size end |
#rng ⇒ Object
The random number generator to use for shuffles and samples. Defaults to self#rand.
30 31 32 |
# File 'lib/huge_enumerable.rb', line 30 def rng @rng end |
Instance Method Details
#[](index_or_range, length = nil) ⇒ Object
Element Reference — Returns the element at index, or returns a subarray starting at the start index and continuing for length elements, or returns a subarray specified by range of indices. Negative indices count backward from the end of the collection (-1 is the last element). For start and range cases the starting index is just before an element. Additionally, an empty array is returned when the starting index for an element range is at the end of the collection. Returns nil if the index (or starting index) are out of range.
Attributes
-
index_or_range
- Either an integer for single element selection or length selection, or a range.
Options
-
:length
- The number of elements to return if index_or_range is not a range.
58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 |
# File 'lib/huge_enumerable.rb', line 58 def [](index_or_range, length=nil) # TODO: Consider changing this to return HugeCollection if index_or_range.is_a?(Range) range = index_or_range index = nil else index = index_or_range.to_i range = nil end if range index = range.first index += size if index < 0 length = range.last - index + 1 length += size if range.last < 0 length = size - index if index + length > size if index < 0 || index > size nil elsif length < 0 [] else element_or_array(length) { |i| _fetch(i + index) } end elsif length index += size if index < 0 length = size - index if index + length > size if index < 0 || length < 0 nil else element_or_array(length) { |i| _fetch(i + index) } end else _fetch(index) end end |
#collection_each(&block) ⇒ Object
Calls the given block once for each element remaining in the collection, passing that element as a parameter.
98 99 100 101 |
# File 'lib/huge_enumerable.rb', line 98 def collection_each(&block) # :yields: element # TODO: Return an Enumerator if no block is given size.times { |i| yield _fetch(i) } end |
#combination(n, &block) ⇒ Object
When invoked with a block, yields all combinations of length n of elements from the collection and then returns the collection itself. If no block is given, an HugeCombination is returned instead.
Caveat
max_array_size is currently inherited by the generated HugeCombination. This may change in the future.
107 108 109 110 111 112 113 114 115 116 117 118 |
# File 'lib/huge_enumerable.rb', line 107 def combination(n, &block) # :yields: element # Check to see if we have a specific random number generator to use. # Using hash comparison as dups, clones, and other actions can make == and eql? return false when it is actually the same method random_number_generator = rng.hash != method(:rand).hash ? rng : nil combo = HugeCombination.new(self.clone.reset!, n, max_array_size, random_number_generator) if block combo.each(&block) self else combo end end |
#each(&block) ⇒ Object
Calls the given block once for each element in the next array of the collection, passing that element as a parameter.
121 122 123 124 125 |
# File 'lib/huge_enumerable.rb', line 121 def each(&block) # :yields: element # TODO: Return an Enumerator if no block is given remaining_or(max_array_size).times(&(block << method(:_fetch))) # remaining_or(max_array_size).times { |i| yield _fetch(i) } end |
#empty? ⇒ Boolean
Returns true of the collection contains no more elements.
143 144 145 |
# File 'lib/huge_enumerable.rb', line 143 def empty? @start_of_sequence == @end_of_sequence end |
#initialize_copy(orig) ⇒ Object
127 128 129 130 |
# File 'lib/huge_enumerable.rb', line 127 def initialize_copy(orig) super @rng = @rng.unbind.bind(self) if @rng.respond_to?(:unbind) # Make sure this is bound to self if it is a method end |
#next_array ⇒ Object
Shifts max_array_size elements and returns the following array from to_a.
137 138 139 140 |
# File 'lib/huge_enumerable.rb', line 137 def next_array shift(max_array_size) to_a end |
#permutation(n, &block) ⇒ Object
When invoked with a block, yields all permutations of length n of elements from the collection and then returns the collection itself. If no block is given, a HugePermutation is returned instead.
Caveat
max_array_size is currently inherited by the generated HugePermutation. This may change in the future.
151 152 153 154 155 156 157 158 159 160 161 162 |
# File 'lib/huge_enumerable.rb', line 151 def permutation(n, &block) # :yields: element # Check to see if we have a specific random number generator to use. # Using hash comparison as dups, clones, and other actions can make == and eql? return false when it is actually the same method random_number_generator = rng.hash != method(:rand).hash ? rng : nil perm = HugePermutation.new(self.clone.reset!, n, max_array_size, random_number_generator) if block perm.each(&block) self else perm end end |
#pop(n = nil) ⇒ Object
Removes the last element from the collection and returns it, or nil if the collection is empty. If a number n is given, returns an array of the last n elements (or less).
166 167 168 169 |
# File 'lib/huge_enumerable.rb', line 166 def pop(n = nil) result = element_or_array(n) { pop1 } n ? result.reverse : result end |
#product(other_enumerable, &block) ⇒ Object
When invoked with a block, yields all combinations of elements from the collection and the other enumerable and then returns the collection itself. If no block is given, a HugeProduct is returned instead.
Caveat
max_array_size is currently inherited by the generated HugeProduct. This may change in the future. other_enumerable is duped and reset if it is a HugeEnumerable. This may change in the future.
176 177 178 179 180 181 182 183 184 185 186 187 188 |
# File 'lib/huge_enumerable.rb', line 176 def product(other_enumerable, &block) # :yields: element other_enumerable = other_enumerable.clone.reset! if other_enumerable.is_a?(HugeEnumerable) # Check to see if we have a specific random number generator to use. # Using hash comparison as dups, clones, and other actions can make == and eql? return false when it is actually the same method random_number_generator = rng.hash != method(:rand).hash ? rng : nil prod = HugeProduct.new(self.clone.reset!, other_enumerable, max_array_size, random_number_generator) if block prod.each(&block) self else prod end end |
#sample(*args) ⇒ Object
Choose a random element or n random elements from the collection. The elements are chosen by using random and unique indices into the array in order to ensure that an element does not repeat itself unless the collection already contained duplicate elements. If the collection is empty the first form returns nil and the second form returns an empty array. The optional rng argument will be used as the random number generator.
195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 |
# File 'lib/huge_enumerable.rb', line 195 def sample(*args) if args.size > 2 raise ArgumentError, "wrong number of arguments (#{args.size} for 2)" elsif args.size == 2 n = args.first rng = args.last elsif args.size == 1 arg = args.first if arg.is_a?(Proc) || arg.is_a?(Method) n = 1 rng = arg else n = arg rng = method(:rand) end else n = nil rng = method(:rand) end element_or_array(n) { sample1(rng) } end |
#shift(n = nil) ⇒ Object
Removes the first element of the collection and returns it (shifting all other elements down by one). Returns nil if the collection is empty. If a number n is given, returns an array of the first n elements (or less). With collection containing only the remainder elements, not including what was shifted to returned array.
Options
-
rng
- The random number generator to use. Defaults to self#rng.
224 225 226 |
# File 'lib/huge_enumerable.rb', line 224 def shift(n = nil) element_or_array(n) { shift1 } end |
#shuffle(rng = nil) ⇒ Object
Returns a new HugeEnumerable with the order of the elements of the new collection randomized.
Options
-
rng
- The random number generator to use. Defaults to self#rng.
Side Effects
The new collection is reset to the current collection’s original size and elements before shuffling.
233 234 235 |
# File 'lib/huge_enumerable.rb', line 233 def shuffle(rng=nil) self.clone.shuffle!(rng) end |
#shuffle!(rng = nil) ⇒ Object
Randomly reorders the elements of the collection.
Options
-
rng
- The random number generator to use. Defaults to self#rng.
Side Effects
The collection is reset to its original size and elements before shuffling
242 243 244 245 246 247 248 |
# File 'lib/huge_enumerable.rb', line 242 def shuffle!(rng=nil) rng ||= self.rng reset! @shuffle_head = rng.call(collection_size) @collection_increment = full_cycle_increment(collection_size) self end |
#size ⇒ Object
Returns the current size of the collection. Unlike collection_size, this tracks size changes caused by push, pop, shift, and next_array.
252 253 254 |
# File 'lib/huge_enumerable.rb', line 252 def size end_of_sequence - start_of_sequence end |