Class: ExternalSortModule::Sorter

Inherits:
Object
  • Object
show all
Defined in:
lib/geotree/externalsort.rb

Overview

Performs an external sort of a binary file. Used by the GeoTree module to shuffle buffered point sets into a random order prior to adding to the tree, in order to create a balanced tree.

Constant Summary collapse

MAX_CHUNKS_ =
8

Instance Method Summary collapse

Constructor Details

#initialize(path, element_size, comparator = nil, max_chunk_size = MAX_CHUNK_SIZE_, max_chunks = MAX_CHUNKS_) ⇒ Sorter

Constructor

Parameters:

  • path

    of file to sort

  • element_size

    size, in bytes, of each element

  • comparator (defaults to: nil)

    to compare elements; if nil, compares the bytes as substrings

Raises:

  • (ArgumentError)


193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
# File 'lib/geotree/externalsort.rb', line 193

def initialize(path, element_size, comparator=nil, max_chunk_size = MAX_CHUNK_SIZE_, max_chunks = MAX_CHUNKS_)
  raise ArgumentError,"no such file" if !File.file?(path)

  @comparator = comparator || Proc.new do |x,y|
    bx,ox = x
    by,oy = y
    bx[ox,@element_size] <=> by[oy,@element_size]
  end

  @path = path

  @work_file = nil

  @file_len = File.size(path)
  if @file_len == 0 || @file_len % element_size != 0
    raise ArgumentError,"File length #{@file_len} is not a positive multiple of element size #{element_size}"
  end
  @element_size = element_size
  @max_chunks = max_chunks
  @max_chunk_size = max_chunk_size - max_chunk_size % element_size
  raise ArgumentError if @max_chunk_size <= 0
end

Instance Method Details

#sortObject



216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
# File 'lib/geotree/externalsort.rb', line 216

def sort
  @file = File.open(@path,"r+b")

  # Break file into chunks, sorting them in place
  build_initial_segments
  sort_chunks_in_place

  require 'tempfile'
  
  @work_file = Tempfile.new('_externalsort_')
  @work_file.binmode

  while @segments.size > 1
    @segments = merge_segments(@segments)
  end

  @work_file.unlink
end