Class: CorrectHorseBatteryStaple::Corpus::IsamKD
- Inherits:
-
Base
- Object
- CorrectHorseBatteryStaple::Corpus
- Base
- CorrectHorseBatteryStaple::Corpus::IsamKD
- Includes:
- Backend::IsamKD, Memoize
- Defined in:
- lib/correct_horse_battery_staple/corpus/isam_kd.rb
Overview
Format of header:
0..3 - OB - offset of body start in bytes; network byte order 4..7 - LP - length of prelude in network byte order 8..OB-1 - P - JSON-encoded prelude hash and space padding OB..EOF - array of fixed size records as described in prelude
Contents of Prelude (after JSON decoding):
P - length of word part of record P - length of frequency part of record (always 4 bytes) P - length of total part of record P - number of records P - field name sorted by (word or frequency) P - corpus statistics
Format of record:
2 bytes - LW - actual length of word within field P bytes - LW bytes of word (W) + P-LW bytes of padding P (4) bytes - frequency as network byte order long
Constant Summary
Constants included from Backend::IsamKD
Backend::IsamKD::F_PRELUDE_AT_END, Backend::IsamKD::INITIAL_PRELUDE_LENGTH
Instance Attribute Summary
Attributes inherited from Base
#frequency_mean, #frequency_stddev, #original_size, #probability_mean, #probability_stddev, #weighted_size
Instance Method Summary collapse
- #file_size(file) ⇒ Object
-
#initialize(filename, stats = nil) ⇒ IsamKD
constructor
A new instance of IsamKD.
- #load_index ⇒ Object
- #precache(max = -1)) ⇒ Object
- #prelude ⇒ Object
Methods included from Backend::IsamKD
Methods included from Memoize
Methods inherited from Base
#candidates, #compose_filters, #count, #count_by_options, #count_candidates, #each, #entropy_per_word, #entropy_per_word_by_filter, #filter, #filter_for_options, #frequencies, #inspect, #load_stats_from_hash, #pick, read, #recalculate, #reset, #result, #sorted_entries, #stats, #words
Methods included from CorrectHorseBatteryStaple::Common
#array_sample, #logger, #random_in_range, #random_number, #set_sample
Methods inherited from CorrectHorseBatteryStaple::Corpus
Constructor Details
#initialize(filename, stats = nil) ⇒ IsamKD
Returns a new instance of IsamKD.
34 35 36 37 38 39 40 |
# File 'lib/correct_horse_battery_staple/corpus/isam_kd.rb', line 34 def initialize(filename, stats = nil) super @filename = filename @file = CorrectHorseBatteryStaple::Util.open_binary(filename, "r") parse_prelude load_index end |
Instance Method Details
#file_size(file) ⇒ Object
48 49 50 |
# File 'lib/correct_horse_battery_staple/corpus/isam_kd.rb', line 48 def file_size(file) (file.respond_to?(:size) ? file.size : file.stat.size) end |
#load_index ⇒ Object
56 57 58 |
# File 'lib/correct_horse_battery_staple/corpus/isam_kd.rb', line 56 def load_index @kdtree ||= load_kdtree end |
#precache(max = -1)) ⇒ Object
42 43 44 45 46 |
# File 'lib/correct_horse_battery_staple/corpus/isam_kd.rb', line 42 def precache(max = -1) return if max > -1 && file_size(@file) > max @file.seek 0 @file = StringIO.new @file.read, "r" end |
#prelude ⇒ Object
52 53 54 |
# File 'lib/correct_horse_battery_staple/corpus/isam_kd.rb', line 52 def prelude @prelude ||= parse_prelude end |