Class: FlatKit::MergeTree

Inherits:
Object
  • Object
show all
Includes:
Enumerable
Defined in:
lib/flat_kit/merge_tree.rb

Overview

Public: Merge a list of sorted records from Readers into a single output Writer

The MergeTree implements a Tournament Tree algorightm to do a k-way merge between Reader objects:

https://en.wikipedia.org/wiki/K-way_merge_algorithm

Usage:

compare_fields = %w[ key timestamp ]
format = ::FlatKit::Format.for('json')

readers = ARGV.map do |path|
  format.reader.new(source: path, compare_fields: compare_fields)
end

write_path = "merged.jsonl"
writer = format.writer.new(destination: write_path.to_s)

tree = ::FlatKit::MergeTree.new(readers)

tree.each do |record|
   writer.write(record)
end
writer.close

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(readers) ⇒ MergeTree

Returns a new instance of MergeTree.



36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
# File 'lib/flat_kit/merge_tree.rb', line 36

def initialize(readers)
  @readers = readers
  @leaves = []
  @levels = []

  @readers.each do |reader|
    @leaves << LeafNode.new(reader)
  end

  # Need to pad the leaves to an even number so that the slicing by 2 for
  # the tournament will work
  @leaves << SentinelLeafNode.new if @leaves.size.odd?

  init_tree
end

Instance Attribute Details

#leavesObject (readonly)

Returns the value of attribute leaves.



34
35
36
# File 'lib/flat_kit/merge_tree.rb', line 34

def leaves
  @leaves
end

#levelsObject (readonly)

Returns the value of attribute levels.



34
35
36
# File 'lib/flat_kit/merge_tree.rb', line 34

def levels
  @levels
end

#readersObject (readonly)

Returns the value of attribute readers.



34
35
36
# File 'lib/flat_kit/merge_tree.rb', line 34

def readers
  @readers
end

Instance Method Details

#depthObject

The number of levels, this shold be the logn(readers.size)



84
85
86
# File 'lib/flat_kit/merge_tree.rb', line 84

def depth
  @levels.size
end

#eachObject

Iterate over all the ements from all the readers yielding them in sorted order.



92
93
94
95
96
97
98
99
100
101
# File 'lib/flat_kit/merge_tree.rb', line 92

def each
  loop do
    break if root.leaf.finished?

    yield root.value
    # consume the yielded value and have the tournament tree replay those
    # brackets affected
    root.leaf.update_and_replay
  end
end

#init_treeObject

Initialize the tournament tree, go in depths - bottom layer will be the winners of the 2 leaf nodes, continuing until top layer is just 1 node



56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
# File 'lib/flat_kit/merge_tree.rb', line 56

def init_tree
  values = @leaves.dup
  loop do
    break if values.size == 1

    winners = []

    # we alays need a left and a right node, there is the possibility of
    # adding a single Sentinel node as the final right node in each level
    values.each_slice(2) do |left, right|
      right = SentinelInternalNode.new if right.nil?
      winners << InternalNode.new(left: left, right: right)
    end
    values = winners
    @levels << winners
  end
end

#rootObject

Root is the last level - should only have one node



77
78
79
# File 'lib/flat_kit/merge_tree.rb', line 77

def root
  @levels[-1].first
end