Class: FlatKit::MergeTree
- Inherits:
-
Object
- Object
- FlatKit::MergeTree
- Includes:
- Enumerable
- Defined in:
- lib/flat_kit/merge_tree.rb
Overview
Public: Merge a list of sorted records from Readers into a single output Writer
The MergeTree implements a Tournament Tree algorightm to do a k-way merge between Reader objects:
https://en.wikipedia.org/wiki/K-way_merge_algorithm
Usage:
compare_fields = %w[ key timestamp ]
format = ::FlatKit::Format.for('json')
readers = ARGV.map do |path|
format.reader.new(source: path, compare_fields: compare_fields)
end
write_path = "merged.jsonl"
writer = format.writer.new(destination: write_path.to_s)
tree = ::FlatKit::MergeTree.new(readers)
tree.each do |record|
writer.write(record)
end
writer.close
Instance Attribute Summary collapse
-
#leaves ⇒ Object
readonly
Returns the value of attribute leaves.
-
#levels ⇒ Object
readonly
Returns the value of attribute levels.
-
#readers ⇒ Object
readonly
Returns the value of attribute readers.
Instance Method Summary collapse
-
#depth ⇒ Object
The number of levels, this shold be the logn(readers.size).
-
#each ⇒ Object
Iterate over all the ements from all the readers yielding them in sorted order.
-
#init_tree ⇒ Object
Initialize the tournament tree, go in depths - bottom layer will be the winners of the 2 leaf nodes, continuing until top layer is just 1 node.
-
#initialize(readers) ⇒ MergeTree
constructor
A new instance of MergeTree.
-
#root ⇒ Object
Root is the last level - should only have one node.
Constructor Details
#initialize(readers) ⇒ MergeTree
Returns a new instance of MergeTree.
36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 |
# File 'lib/flat_kit/merge_tree.rb', line 36 def initialize(readers) @readers = readers @leaves = [] @levels = [] @readers.each do |reader| @leaves << LeafNode.new(reader) end # Need to pad the leaves to an even number so that the slicing by 2 for # the tournament will work @leaves << SentinelLeafNode.new if @leaves.size.odd? init_tree end |
Instance Attribute Details
#leaves ⇒ Object (readonly)
Returns the value of attribute leaves.
34 35 36 |
# File 'lib/flat_kit/merge_tree.rb', line 34 def leaves @leaves end |
#levels ⇒ Object (readonly)
Returns the value of attribute levels.
34 35 36 |
# File 'lib/flat_kit/merge_tree.rb', line 34 def levels @levels end |
#readers ⇒ Object (readonly)
Returns the value of attribute readers.
34 35 36 |
# File 'lib/flat_kit/merge_tree.rb', line 34 def readers @readers end |
Instance Method Details
#depth ⇒ Object
The number of levels, this shold be the logn(readers.size)
84 85 86 |
# File 'lib/flat_kit/merge_tree.rb', line 84 def depth @levels.size end |
#each ⇒ Object
Iterate over all the ements from all the readers yielding them in sorted order.
92 93 94 95 96 97 98 99 100 101 |
# File 'lib/flat_kit/merge_tree.rb', line 92 def each loop do break if root.leaf.finished? yield root.value # consume the yielded value and have the tournament tree replay those # brackets affected root.leaf.update_and_replay end end |
#init_tree ⇒ Object
Initialize the tournament tree, go in depths - bottom layer will be the winners of the 2 leaf nodes, continuing until top layer is just 1 node
56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 |
# File 'lib/flat_kit/merge_tree.rb', line 56 def init_tree values = @leaves.dup loop do break if values.size == 1 winners = [] # we alays need a left and a right node, there is the possibility of # adding a single Sentinel node as the final right node in each level values.each_slice(2) do |left, right| right = SentinelInternalNode.new if right.nil? winners << InternalNode.new(left: left, right: right) end values = winners @levels << winners end end |
#root ⇒ Object
Root is the last level - should only have one node
77 78 79 |
# File 'lib/flat_kit/merge_tree.rb', line 77 def root @levels[-1].first end |