Class: CorrectHorseBatteryStaple::Corpus::IsamKD

Inherits:
Base show all
Includes:
Backend::IsamKD, Memoize
Defined in:
lib/correct_horse_battery_staple/corpus/isam_kd.rb

Overview

Format of header:

0..3 - OB - offset of body start in bytes; network byte order 4..7 - LP - length of prelude in network byte order 8..OB-1 - P - JSON-encoded prelude hash and space padding OB..EOF - array of fixed size records as described in prelude

Contents of Prelude (after JSON decoding):

P - length of word part of record P - length of frequency part of record (always 4 bytes) P - length of total part of record P - number of records P - field name sorted by (word or frequency) P - corpus statistics

Format of record:

2 bytes - LW - actual length of word within field P bytes - LW bytes of word (W) + P-LW bytes of padding P (4) bytes - frequency as network byte order long

Constant Summary

Constants included from Backend::IsamKD

Backend::IsamKD::F_PRELUDE_AT_END, Backend::IsamKD::INITIAL_PRELUDE_LENGTH

Instance Attribute Summary

Attributes inherited from Base

#frequency_mean, #frequency_stddev, #original_size, #probability_mean, #probability_stddev, #weighted_size

Instance Method Summary collapse

Methods included from Backend::IsamKD

included

Methods included from Memoize

included

Methods inherited from Base

#candidates, #compose_filters, #count, #count_by_options, #count_candidates, #each, #entropy_per_word, #entropy_per_word_by_filter, #filter, #filter_for_options, #frequencies, #inspect, #load_stats_from_hash, #pick, read, #recalculate, #reset, #result, #sorted_entries, #stats, #words

Methods included from CorrectHorseBatteryStaple::Common

#array_sample, #logger, #random_in_range, #random_number, #set_sample

Methods inherited from CorrectHorseBatteryStaple::Corpus

format_for, read

Constructor Details

#initialize(filename, stats = nil) ⇒ IsamKD

Returns a new instance of IsamKD.



34
35
36
37
38
39
40
# File 'lib/correct_horse_battery_staple/corpus/isam_kd.rb', line 34

def initialize(filename, stats = nil)
  super
  @filename = filename
  @file = CorrectHorseBatteryStaple::Util.open_binary(filename, "r")
  parse_prelude
  load_index
end

Instance Method Details

#file_size(file) ⇒ Object



48
49
50
# File 'lib/correct_horse_battery_staple/corpus/isam_kd.rb', line 48

def file_size(file)
  (file.respond_to?(:size) ? file.size : file.stat.size)
end

#load_indexObject



56
57
58
# File 'lib/correct_horse_battery_staple/corpus/isam_kd.rb', line 56

def load_index
  @kdtree ||= load_kdtree
end

#precache(max = -1)) ⇒ Object



42
43
44
45
46
# File 'lib/correct_horse_battery_staple/corpus/isam_kd.rb', line 42

def precache(max = -1)
  return if max > -1 && file_size(@file) > max
  @file.seek 0
  @file = StringIO.new @file.read, "r"
end

#preludeObject



52
53
54
# File 'lib/correct_horse_battery_staple/corpus/isam_kd.rb', line 52

def prelude
  @prelude ||= parse_prelude
end